MongoDB聚合：无需$lookup关联合并同结构多集合并排序

阿华AIGC实验室

2026-5-8

How to Merge Documents from Multiple Collections Before Aggregation in MongoDB

Got it, let's tackle this problem head-on! You want to combine matching documents from multiple same-structured collections (like Jobs.Current and Jobs.Finished), then run aggregation operations (like sorting) on the combined result set. And you're right—$lookup isn't the right tool here because it tacks on related documents as an array, rather than flattening them into your main result stream.

The Best Solution: Use `$unionWith` (MongoDB 4.4+)

MongoDB's $unionWith aggregation stage was built exactly for this scenario. It lets you merge documents from another collection into your current aggregation pipeline, returning a flat set of documents that you can then filter, sort, or aggregate further just like they came from a single collection.

Here's a concrete example that matches your use case:

db.Jobs.Current.aggregate([
  // First, filter documents from the Current collection
  { $match: { status: { $in: ["active", "pending"] } } },
  // Merge in matching documents from the Finished collection
  {
    $unionWith: {
      coll: "Jobs.Finished",
      pipeline: [
        // Add your filter for the Finished collection here
        { $match: { completedAt: { $gte: ISODate("2024-01-01T00:00:00Z") } } }
      ]
    }
  },
  // Now sort the combined result set (or add any other aggregation steps)
  { $sort: { createdAt: -1 } },
  // Optional: Limit results for pagination
  { $limit: 20 }
])

Key perks of $unionWith:

It returns a flat stream of documents, not nested arrays—perfect for your post-aggregation needs.
You can apply collection-specific filters directly in the pipeline parameter of $unionWith.
It's optimized for performance, especially compared to workarounds for older MongoDB versions.

For MongoDB Versions < 4.4: Use `$facet` + `$concatArrays`

If you're stuck on an older MongoDB version that doesn't support $unionWith, you can use a combination of $facet to query each collection separately, then $concatArrays to merge the results. Note that this is less efficient for large datasets, as it loads all matching documents into memory first.

Example:

db.aggregate([
  {
    $facet: {
      // Fetch filtered documents from Jobs.Current
      currentJobs: [
        { $lookup: { from: "Jobs.Current", pipeline: [{ $match: { status: { $in: ["active", "pending"] } } }], as: "docs" } },
        { $unwind: "$docs" },
        { $replaceRoot: { newRoot: "$docs" } }
      ],
      // Fetch filtered documents from Jobs.Finished
      finishedJobs: [
        { $lookup: { from: "Jobs.Finished", pipeline: [{ $match: { completedAt: { $gte: ISODate("2024-01-01T00:00:00Z") } } }], as: "docs" } },
        { $unwind: "$docs" },
        { $replaceRoot: { newRoot: "$docs" } }
      ]
    }
  },
  // Merge the two result arrays into one
  { $project: { combinedJobs: { $concatArrays: ["$currentJobs", "$finishedJobs"] } } },
  // Flatten the combined array into individual documents
  { $unwind: "$combinedJobs" },
  // Set the merged documents as the root of the pipeline
  { $replaceRoot: { newRoot: "$combinedJobs" } },
  // Sort the final result set
  { $sort: { createdAt: -1 } },
  { $limit: 20 }
])

Quick Notes

Make sure your collections have consistent structures—if some fields are missing in one collection, the merged results will reflect that (you can use $addFields to fill in defaults if needed).
If you have duplicate _id values across collections and want to de-duplicate, add a $group stage after merging to keep only unique documents.

内容的提问来源于stack exchange，提问作者Nick