1
votes

I have an API with a single endpoint that retrieves documents from a CosmosDB collection. The endpoint works fine on common scenarios. However, when I execute stress tests on the API, to measure how it responds under heavy load, I experience outages on the site (site starts to respond requests with 502 - bad gateway).

Searching on Application Insights, I notice OutOfMemory exceptions raising while executing the sentence to retrieve the documents from the CosmosDB collection. The method that I'm using to read the documents is ReadNextAsync and the logs points this line specifically.

We read and tested the best practices and tips that the Cosmos DB documentation mentions to discard a bad usage of the SDK from our side, but even trying with different configurations (MaxItemCount, MaxBufferedItems, MaxConcurrency), the issue persisted.

After executing several tests, I noticed that if I limit the amount of documents to retrieve from the collection (e.g. using a TOP 40 clause), the exceptions or site outages don’t show. Instead, all requests are processed successfully with 200 status code. As I haven’t had these kind of issues on our Full FWK API, which has the exactly same logic as the .NET Core API described here, I'm wondering whether I could be doing a bad usage of the .NET Core SDK.

In order to share more context, I detailed below the general specifications and also the details on how I configured CosmosDB, along with the implementation to retrieve documents. Additionally, I included logs and a related stack trace found on Application Insights exceptions table.

General Specifications

  • API .NET CORE 2.2
  • Microsoft.Azure.Cosmos 3.5.0

Cosmos DB specifications

  • CosmosDB client connection
    • Connection mode: Direct
    • Application Region: West US
    • Default values for the rest
  • CosmosDB target collection
    • ~600 documents
    • ~30K size each document
    • PartitionKey -> id (one logical partition per document)
    • Write region -> West US
    • Read regions -> West Us, West Europe, Southeast Asia and Brazil South

Stress scenario details

Execute 400 request per second looking for retrieving up to 200 documents per request.

Document retrieving implementation

var feed = container.GetItemLinqQueryable<T>(false, null, queryRequestOptions).Where(predicate).ToFeedIterator();
var batches = new List<FeedResponse<T>>();

while (feed.HasMoreResults)
{
    var batch = await feed.ReadNextAsync();
    batches.Add(batch);
}

Application Insights exception stack trace

Response status code does not indicate success: 500 Substatus: 0 Reason: (System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Collections.Generic.List`1.set_Capacity(Int32 value)
   at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)
   at System.Collections.Generic.List`1.AddWithResize(T item)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseObjectNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParsePropertyNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseObjectNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseArrayNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParsePropertyNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseObjectNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.ParseNode(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator.Parser.Parse(IJsonReader jsonTextReader)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.JsonTextNavigator..ctor(ReadOnlyMemory`1 buffer, Boolean skipValidation)
   at Microsoft.Azure.Cosmos.Json.JsonNavigator.Create(ReadOnlyMemory`1 buffer, JsonStringDictionary jsonStringDictionary, Boolean skipValidation)
   at Microsoft.Azure.Cosmos.CosmosElements.CosmosElementSerializer.ToCosmosElements(MemoryStream memoryStream, ResourceType resourceType, CosmosSerializationFormatOptions cosmosSerializationOptions)
   at Microsoft.Azure.Cosmos.CosmosQueryClientCore.GetCosmosElementResponse(QueryRequestOptions requestOptions, ResourceType resourceType, ResponseMessage cosmosResponseMessage, PartitionKeyRangeIdentity partitionKeyRangeIdentity, SchedulingStopwatch schedulingStopwatch)
   at Microsoft.Azure.Cosmos.CosmosQueryClientCore.ExecuteItemQueryAsync[RequestOptionType](Uri resourceUri, ResourceType resourceType, OperationType operationType, RequestOptionType requestOptions, SqlQuerySpec sqlQuerySpec, String continuationToken, PartitionKeyRangeIdentity partitionKeyRange, Boolean isContinuationExpected, Int32 pageSize, SchedulingStopwatch schedulingStopwatch, CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.Query.ItemProducer.BufferMoreDocumentsAsync(CancellationToken token)
   at Microsoft.Azure.Cosmos.Query.ItemProducer.BufferMoreIfEmptyAsync(CancellationToken token)
   at Microsoft.Azure.Cosmos.Query.ItemProducer.TryMoveNextPageAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.Query.ItemProducerTree.TryMoveNextPageImplementationAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.Query.ItemProducerTree.ExecuteWithSplitProofingAsync(Func`2 function, Boolean functionNeedsBeReexecuted, CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.Query.ItemProducerTree.TryMoveNextPageAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.Query.CosmosParallelItemQueryExecutionContext.DrainAsync(Int32 maxElements, CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.PipelinedDocumentQueryExecutionContext.ExecuteNextAsync(CancellationToken token)
   at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.LazyCosmosQueryExecutionContext.ExecuteNextAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.CosmosQueryExecutionContextWithNameCacheStaleRetry.ExecuteNextAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.Query.Core.ExecutionContext.CatchAllCosmosQueryExecutionContext.ExecuteNextAsync(CancellationToken cancellationToken)).

{"assembly":"Microsoft.Azure.Cosmos.Client, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35","method":"Microsoft.Azure.Cosmos.ResponseMessage.EnsureSuccessStatusCode","level":0,"line":0}

{"assembly":"Microsoft.Azure.Cosmos.Client, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35","method":"Microsoft.Azure.Cosmos.CosmosResponseFactory.CreateQueryFeedResponseHelper","level":1,"line":0}

{"assembly":"Microsoft.Azure.Cosmos.Client, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35","method":"Microsoft.Azure.Cosmos.CosmosResponseFactory.CreateQueryFeedResponse","level":2,"line":0}

{"assembly":"Microsoft.Azure.Cosmos.Client, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35","method":"Microsoft.Azure.Cosmos.FeedIteratorCore`1+<ReadNextAsync>d__5.MoveNext","level":3,"line":0}
1
What's stopping you from splitting this into multiple requests? - Train
What are you passing as queryRequestOptions? - Matias Quaranta
@Train Do you mean paginate over CosmosDB to retrieve all the requested documents? On that case we received ServiceUnavailableExceptions since the amount of request increased as well. However, it doesn't explain why we don't receive OutOfMemory exceptions while working with Full FWK. - Lucas Marambio
@MatiasQuaranta queryRequestOptions are default. - Lucas Marambio

1 Answers

0
votes

I've not used cosmosdb so not sure if this really relevant but accroding to the azure documentation each request is limited 4MB per request.

Am I correct in thinking in the example code you have given above there is no filtering? Meaning all 600 documents (~30k each) are returned?

You might have more success trying to split this into multipl requests