MongoDB聚合:无需$lookup关联合并同结构多集合并排序
Got it, let's tackle this problem head-on! You want to combine matching documents from multiple same-structured collections (like Jobs.Current and Jobs.Finished), then run aggregation operations (like sorting) on the combined result set. And you're right—$lookup isn't the right tool here because it tacks on related documents as an array, rather than flattening them into your main result stream.
The Best Solution: Use $unionWith (MongoDB 4.4+)
MongoDB's $unionWith aggregation stage was built exactly for this scenario. It lets you merge documents from another collection into your current aggregation pipeline, returning a flat set of documents that you can then filter, sort, or aggregate further just like they came from a single collection.
Here's a concrete example that matches your use case:
db.Jobs.Current.aggregate([ // First, filter documents from the Current collection { $match: { status: { $in: ["active", "pending"] } } }, // Merge in matching documents from the Finished collection { $unionWith: { coll: "Jobs.Finished", pipeline: [ // Add your filter for the Finished collection here { $match: { completedAt: { $gte: ISODate("2024-01-01T00:00:00Z") } } } ] } }, // Now sort the combined result set (or add any other aggregation steps) { $sort: { createdAt: -1 } }, // Optional: Limit results for pagination { $limit: 20 } ])
Key perks of $unionWith:
- It returns a flat stream of documents, not nested arrays—perfect for your post-aggregation needs.
- You can apply collection-specific filters directly in the
pipelineparameter of$unionWith. - It's optimized for performance, especially compared to workarounds for older MongoDB versions.
For MongoDB Versions < 4.4: Use $facet + $concatArrays
If you're stuck on an older MongoDB version that doesn't support $unionWith, you can use a combination of $facet to query each collection separately, then $concatArrays to merge the results. Note that this is less efficient for large datasets, as it loads all matching documents into memory first.
Example:
db.aggregate([ { $facet: { // Fetch filtered documents from Jobs.Current currentJobs: [ { $lookup: { from: "Jobs.Current", pipeline: [{ $match: { status: { $in: ["active", "pending"] } } }], as: "docs" } }, { $unwind: "$docs" }, { $replaceRoot: { newRoot: "$docs" } } ], // Fetch filtered documents from Jobs.Finished finishedJobs: [ { $lookup: { from: "Jobs.Finished", pipeline: [{ $match: { completedAt: { $gte: ISODate("2024-01-01T00:00:00Z") } } }], as: "docs" } }, { $unwind: "$docs" }, { $replaceRoot: { newRoot: "$docs" } } ] } }, // Merge the two result arrays into one { $project: { combinedJobs: { $concatArrays: ["$currentJobs", "$finishedJobs"] } } }, // Flatten the combined array into individual documents { $unwind: "$combinedJobs" }, // Set the merged documents as the root of the pipeline { $replaceRoot: { newRoot: "$combinedJobs" } }, // Sort the final result set { $sort: { createdAt: -1 } }, { $limit: 20 } ])
Quick Notes
- Make sure your collections have consistent structures—if some fields are missing in one collection, the merged results will reflect that (you can use
$addFieldsto fill in defaults if needed). - If you have duplicate
_idvalues across collections and want to de-duplicate, add a$groupstage after merging to keep only unique documents.
内容的提问来源于stack exchange,提问作者Nick




