3
votes

I'm new to MongoDB (and its dotnet core C# driver), and I have the following question regarding the IAsyncCursor behavior:

From the official documentation: https://docs.mongodb.com/getting-started/csharp/query/, it seems that the recommended way of iterating through the IAsyncCursor is:

var collection = _database.GetCollection<BsonDocument>("restaurants");
var filter = new BsonDocument();
var count = 0;
using (var cursor = await collection.FindAsync(filter))
{
    while (await cursor.MoveNextAsync())
    {
        var batch = cursor.Current;
        foreach (var document in batch)
        {
            // process document
            count++;
        }
    }
}

However, it seems that to get the "current" batch of the returned documents, the while loop first "MoveNextAsync", does "MoveNextAsync" skips the "current" batch? Or logically, would the following modified code snippet makes more sense?

var collection = _database.GetCollection<BsonDocument>("restaurants");
var filter = new BsonDocument();
var count = 0;
using (var cursor = await collection.FindAsync(filter))
{
    do
    {
        if (cursor.Current != null)
        {
            var batch = cursor.Current;
            foreach (var document in batch)
            {
                // process document
                count++;
            }
        }
    }
    while (await cursor.MoveNextAsync())
}

My understanding is that the cursor should start by pointing to the "current" batch already (if any), and I should first work on whatever the "current" batch is, and then move to the next batch of documents, if any.

But for all the sources I can find online, it seems the iteration always do "MoveNext" first, then work on the batch - this gives me the impression that the IAsyncCursor as returned by FindAsync starts off pointing to a position one prior to the actual "current" (or first) batch of the documents, and "MoveNext" is necessary to be called first to move the cursor to point to the actual current.

From the coding point of view, calling "MoveNext" first makes the while loop more consistent, so my own code snippet doesn't have to (redundantly) check for the validity of the "current" inside the body of "do".

However, I do find that "IAsyncCursor.First()" does return the "first" document - I'm guessing now that the "First()" method actually does a "MoveNext" internally, and returns the first document of the "current" batch.

Also, as I'm using "FindAsync", and if the document is not found based on my filter, is the returned IAsyncCursor "null" or "MoveNext" will return false? Can I assume that IAsyncCursor as returned by FindAsync is always a valid object, so I don't have to excessively check for null, and only need to check the return of "MoveNext()" or "First()"?

Could you MongoDB experts shed your insights into this?

Thanks!

2

2 Answers

3
votes

The first code sample is correct and doesn't skip the first batch. However, you only need to directly use MoveNextAsync if you want explicit control of fetching batches.

Otherwise, it's simpler to use ForEachAsync which wraps that complexity for you:

using (var cursor = await collection.FindAsync(filter))
{
    await cursor.ForEachAsync(document =>
    {
        // process document
        count++;
    }
}

See the ForEachAsync source here.

As shown in the source, ForEachAsync takes ownership of the cursor and disposes it for you so you can also omit your own using if you like.

2
votes

So from all my testing and the sources I have found online, it seems that my original observation was correct that the IAsyncCursor as returned by the FindAsync method initially starts in a neutral state ("Current" seems to be null), that MoveNext(Async) has to be called first to make it move to the first batch that "Current" will then contain the first batch of documents - I also observed for a non-existing find result, MoveNext will make the cursor to have "Current" not null, but Current.Count() would return 0, meaning even though MoveNext succeeded, but still there are no documents in this batch - this somehow exposes a problem with the consistency of the API design - Should "MoveNext" return false to indicate there are no more documents, or after a successful "MoveNext", should "Current" be null to indicate that there are no documents, or should "Current" be non-null, then Current.Count() == 0 indicates that there are no documents? Now I'm checking all the three conditions to make sure that my code is safe, while I think making "MoveNext" to already return false to indicate no further documents would be the most intuitive API to use.