3
votes

I am wandering whether using $unwind operator in aggregation pipeline for document with nested array will return the deconstructed documents in the same order as the order of the items in the array. Example: Suppose I have the following documents

{ "_id" : 1, "item" : "foo", values: [ "foo", "foo2", "foo3"] }
{ "_id" : 2, "item" : "bar", values: [ "bar", "bar2", "bar3"] }
{ "_id" : 3, "item" : "baz", values: [ "baz", "baz2", "baz3"] }

I would like to use paging for all values in all documents in my application code. So, my idea is to use mongo aggregation framework to:

  1. sort the documents by _id
  2. use $unwind on values attribute to deconstruct the documents
  3. use $skip and $limit to simulate paging

So the question using the example described above is:

Is it guaranteed that the following aggregation pipeline:

[
    {$sort: {"_id": 1}},
    {$unwind: "$values"}
]

will always result to the following documents with exactly the same order?:

{ "_id" : 1, "item" : "foo", values: "foo" }
{ "_id" : 1, "item" : "foo", values: "foo2" }
{ "_id" : 1, "item" : "foo", values: "foo3" }
{ "_id" : 2, "item" : "bar", values: "bar" }
{ "_id" : 2, "item" : "bar", values: "bar2" }
{ "_id" : 2, "item" : "bar", values: "bar3" }
{ "_id" : 3, "item" : "baz", values: "baz" }
{ "_id" : 3, "item" : "baz", values: "baz2" }
{ "_id" : 3, "item" : "baz", values: "baz3" }
3
I can't rely on this online example. In the real life I will have a lot more documents and Items in the array (maybe millions). I am looking for official answer. Just like the order of the documents is not guaranteed with simple find({}), but testing it with a small set of documents the returned order is the same - Aleydin Karaimin
This is not explicitly documented, if you are paying for Atlas or MongoDB Enterprise I suggest you go through the official support channels. - D. SM
MongoDB source code calls libunwind function which iterates frames in the chain. Since such functions read binary files on disk, it's not possible alterate order, skip, swap frames... - Valijon
@Valijon. Thank you for this answer. I am not aware how wiredTiger engine actually stores documents on disk. But it looks logically since the the array order is guaranteed, the order of the destructed documents to be always the same. Even, the examples from the official documentation are ordered :), but is not mentioned if we can rely on that order. - Aleydin Karaimin

3 Answers

2
votes

In the case that you do run into issues with order. You could use includeArrayIndex to guarantee order.

[
 {$unwind: {
   path: 'values',
   includeArrayIndex: 'arrayIndex'
 }},
 {$sort: {
   _id: 1,
   arrayIndex: 1
 }},
 { $project: {
    index: 0
 }}
]
2
votes

From what I see at https://github.com/mongodb/mongo/blob/0cee67ce6909ca653462d4609e47edcc4ac5c1a9/src/mongo/db/pipeline/document_source_unwind.cpp

The cursor iterator uses getNext() method to unwind an array:

DocumentSource::GetNextResult DocumentSourceUnwind::doGetNext() {
    auto nextOut = _unwinder->getNext();
    while (nextOut.isEOF()) {
        .....
        // Try to extract an output document from the new input document.
        _unwinder->resetDocument(nextInput.releaseDocument());
        nextOut = _unwinder->getNext();
    }

    return nextOut;
}

And the getNext() implemenation relies on array's index:

DocumentSource::GetNextResult DocumentSourceUnwind::Unwinder::getNext() {

            ....
            // Set field to be the next element in the array. If needed, this will automatically
            // clone all the documents along the field path so that the end values are not shared
            // across documents that have come out of this pipeline operator. This is a partial deep
            // clone. Because the value at the end will be replaced, everything along the path
            // leading to that will be replaced in order not to share that change with any other
            // clones (or the original).
            _output.setNestedField(_unwindPathFieldIndexes, _inputArray[_index]);
            indexForOutput = _index;
            _index++;
            _haveNext = _index < length;

            .....
    return _haveNext ? _output.peek() : _output.freeze();
}

So unless there is anything upstream that messes with document's order the cursor should have unwound docs in the same order as subdocs were stored in the array.

I don't recall how merger works for sharded collections and I imagine there might be a case when documents from other shards are returned from between 2 consecutive unwound documents. What the snippet of the code guarantees is that unwound document with next item from the array will never be returned before unwound document with previous item from the array.

As a side note, having million items in an array is quite an extreme design. Even 20-bytes items in the array will exceed 16Mb doc limit.

2
votes

I also asked the same question in the MongoDB community forum . An answer that confirms my assumption was posted from a member of MongoDB stuff.

Briefly:

Yes, the order of the returned documents in the example above will always be the same. It follows the order from the array field.