In a CreateDocumentQuery I am using MaxItemCount and then HasMoreResults and ExecuteNextAsync - which has been described in other posts.
My issue is that sometimes - particularly after a large update to the DocumentDB - looping through every document has somewhat random results with up to half the documents being ignored.
This ONLY happens if I include a SQL query in the query setup - as I only need to process some fields/columns. If I allow all fields to come back it works 100%. But this is inefficient as I am exporting a couple of columns only and there are close to a million records.
I need to use C# as it is a scheduled job linked up with other C# modules.
Has anyone been able to consistently loop through a large collection using paging?
Code extract below - with the sql included - if I remove the sql from the query there is no issue.
sql = "select d.field1, d.field2 from doc d";
var query = client.CreateDocumentQuery("dbs/" + database.Id + "/colls/" + documentCollection.Id, sql
new FeedOptions { MaxItemCount = 1000 }
).AsDocumentQuery();
while (query.HasMoreResults)
{
FeedResponse<Document> res;
while (true)
{
try
{
res = await query.ExecuteNextAsync<Document>();
break; // success!
}
catch (Exception ex)
{
if (ex.Message.IndexOf("request rate too large") > -1)
{
// DocumentDB is under pressure - wait a while and retry - this will resolve eventually
System.Threading.Thread.Sleep(5000);
}
else
{
errorcount++;
throw ex;
}
}
}
if (res.Any())
{
foreach (var liCurrent in res)
{
try
{
// Convert the Document to a CSV line item
// DO THE FILE LINE CREATION HERE
fileLineItem = "test";
// Write the line to the file
writer.WriteLine(fileLineItem);
}
catch (Exception ex)
{
errorcount++;
throw ex;
}
totalrecords++;
}
}
}