When launching operations in parallel, they usually end up timing out or failing
I'm using:
- Azure Cosmos DB Table API
- .NET Core 2.0 (console app)
- WindowsAzure.Storage (9.2.0) nuget package
Standard CloudStorageAccount.Parse(...).CreateCloudTableClient().GetTableReference("...") setup
This code fires off 8999 tasks. Each task is a TableOperation.Retrieve specifying both PartitionKey & RowKey. I injected some code to track task completion states and any IRetryPolicy hits. There are no 429 errors
Here's the output from a recent run:
await Task.WhenAll(stuff.Select(x => table.ExecuteAsync(opGetter.Get(x))));
0:01 - 12 done
0:02 - 228 done
0:03 - 313 done
0:04 - 435 done
0:05 - 1010 done
0:06 - 1883 done
0:07 - 2833 done
0:08 - 3014 done
0:09 - 3878 done
0:10 - 5447 done
0:11 - 5569 done
0:12 - 6492 done
0:13 - 6527 done
0:14 - 6532 done
0:15 - 6541 done
0:16 - 6543 done
0:17 - 6547 done
0:18 - 6552 done
0:19 - 6554 done
0:20 - 6951 done
0:21 - 8105 done
0:22 - 8128 done
0:23 - 8591 done
0:24 - 8907 done
0:25 - 8908 done
0:29 - 8994 done
0:32 - 8996 done
1:14 - 8997 done
2:26 - 8998 done
5:02 - StatusCode: 0 "An error occurred while sending the request."
5:05 - All 8999 Done
(Sometimes, I get a client-side Timeout instead of that particular error, instead, or multiple errors)
These 8999 retrievals should ideally take a few secs tops.
How can I stop them from stalling?
Note:
- I haven't fiddled with any ServicePoint settings like max connections, etc
- I can't use "Direct Mode" or "TCP" (vs Gateway/Https) because there's no SDK supporting Cosmos DB Table API for .NET Core or Standard
- I suspect the trouble is client-side
- This is being run from a local server (not on azure)
- This isn't "occasional". It's what happens nearly every time, with a large batch like this.
EDIT: Also posted as a GitHub issue. https://github.com/Azure/azure-documentdb-dotnet/issues/517