I have an API with a single endpoint that retrieves documents from a CosmosDB collection. The endpoint works fine on common scenarios. However, when I execute stress tests on the API, to measure how it responds under heavy load, I experience outages on the site (site starts to respond requests with 502 - bad gateway).
Searching on Application Insights, I notice OutOfMemory exceptions raising while executing the sentence to retrieve the documents from the CosmosDB collection. The method that I'm using to read the documents is ReadNextAsync and the logs points this line specifically.
We read and tested the best practices and tips that the Cosmos DB documentation mentions to discard a bad usage of the SDK from our side, but even trying with different configurations (MaxItemCount, MaxBufferedItems, MaxConcurrency), the issue persisted.
After executing several tests, I noticed that if I limit the amount of documents to retrieve from the collection (e.g. using a TOP 40 clause), the exceptions or site outages don’t show. Instead, all requests are processed successfully with 200 status code. As I haven’t had these kind of issues on our Full FWK API, which has the exactly same logic as the .NET Core API described here, I'm wondering whether I could be doing a bad usage of the .NET Core SDK.
In order to share more context, I detailed below the general specifications and also the details on how I configured CosmosDB, along with the implementation to retrieve documents. Additionally, I included logs and a related stack trace found on Application Insights exceptions table.
General Specifications
- API .NET CORE 2.2
- Microsoft.Azure.Cosmos 3.5.0
Cosmos DB specifications
- CosmosDB client connection
- Connection mode: Direct
- Application Region: West US
- Default values for the rest
- CosmosDB target collection
- ~600 documents
- ~30K size each document
- PartitionKey -> id (one logical partition per document)
- Write region -> West US
- Read regions -> West Us, West Europe, Southeast Asia and Brazil South
Stress scenario details
Execute 400 request per second looking for retrieving up to 200 documents per request.
Document retrieving implementation
var feed = container.GetItemLinqQueryable<T>(false, null, queryRequestOptions).Where(predicate).ToFeedIterator();
var batches = new List<FeedResponse<T>>();
while (feed.HasMoreResults)
{
var batch = await feed.ReadNextAsync();
batches.Add(batch);
}
Application Insights exception stack trace
Response status code does not indicate success: 500 Substatus: 0 Reason: (System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Collections.Generic.List`1.set_Capacity(Int32 value)
at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)
at System.Collections.Generic.List`1.AddWithResize(T item)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseObjectNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParsePropertyNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseObjectNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseArrayNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParsePropertyNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseObjectNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseNode(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.Parse(IJsonReader jsonTextReader)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator..ctor(ReadOnlyMemory`1 buffer, Boolean skipValidation)
at Microsoft.Azure.Cosmos.Json.JsonNavigator.Create(ReadOnlyMemory`1 buffer, JsonStringDictionary jsonStringDictionary, Boolean skipValidation)
at Microsoft.Azure.Cosmos.CosmosElements.CosmosElementSerializer.ToCosmosElements(MemoryStream memoryStream, ResourceType resourceType, CosmosSerializationFormatOptions cosmosSerializationOptions)
at Microsoft.Azure.Cosmos.CosmosQueryClientCore.GetCosmosElementResponse(QueryRequestOptions requestOptions, ResourceType resourceType, ResponseMessage cosmosResponseMessage, PartitionKeyRangeIdentity partitionKeyRangeIdentity, SchedulingStopwatch schedulingStopwatch)
at Microsoft.Azure.Cosmos.CosmosQueryClientCore.ExecuteItemQueryAsync[RequestOptionType](Uri resourceUri, ResourceType resourceType, OperationType operationType, RequestOptionType requestOptions, SqlQuerySpec sqlQuerySpec, String continuationToken, PartitionKeyRangeIdentity partitionKeyRange, Boolean isContinuationExpected, Int32 pageSize, SchedulingStopwatch schedulingStopwatch, CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Query.ItemProducer.BufferMoreDocumentsAsync(CancellationToken token)
at Microsoft.Azure.Cosmos.Query.ItemProducer.BufferMoreIfEmptyAsync(CancellationToken token)
at Microsoft.Azure.Cosmos.Query.ItemProducer.TryMoveNextPageAsync(CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Query.ItemProducerTree.TryMoveNextPageImplementationAsync(CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Query.ItemProducerTree.ExecuteWithSplitProofingAsync(Func`2 function, Boolean functionNeedsBeReexecuted, CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Query.ItemProducerTree.TryMoveNextPageAsync(CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Query.CosmosParallelItemQueryExecutionContext.DrainAsync(Int32 maxElements, CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.PipelinedDocumentQueryExecutionContext.ExecuteNextAsync(CancellationToken token)
at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.LazyCosmosQueryExecutionContext.ExecuteNextAsync(CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.CosmosQueryExecutionContextWithNameCacheStaleRetry.ExecuteNextAsync(CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.CatchAllCosmosQueryExecutionContext.ExecuteNextAsync(CancellationToken cancellationToken)).
{"assembly":"Microsoft.Azure.Cosmos.Client, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35","method":"Microsoft.Azure.Cosmos.ResponseMessage.EnsureSuccessStatusCode","level":0,"line":0}
{"assembly":"Microsoft.Azure.Cosmos.Client, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35","method":"Microsoft.Azure.Cosmos.CosmosResponseFactory.CreateQueryFeedResponseHelper","level":1,"line":0}
{"assembly":"Microsoft.Azure.Cosmos.Client, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35","method":"Microsoft.Azure.Cosmos.CosmosResponseFactory.CreateQueryFeedResponse","level":2,"line":0}
{"assembly":"Microsoft.Azure.Cosmos.Client, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35","method":"Microsoft.Azure.Cosmos.FeedIteratorCore`1+<ReadNextAsync>d__5.MoveNext","level":3,"line":0}
queryRequestOptions? - Matias QuarantaqueryRequestOptionsare default. - Lucas Marambio