I am from the team that runs nuget.org, the package ecosystem for .NET. We use Azure Search to power our search API. Our APIs are public, so third-party customers can use them to analyze our ecosystem or make apps.
We recently had an outage caused by a single customer paging through our search documents using the $skip and $top query parameters in batches of 200 documents at a time. This resulted in Azure Search throttling:
Failed to execute request because the request rate has caused your service to exceed the limits of its provisioned capacity. Reduce the rate of requests, or adjust the number of replicas/partitions. See http://aka.ms/azure-search-throttling for more information.
Azure Search's throttling affected all customers in that region for 10 minutes, not just the single customer that was paging. We read through Azure Search's throttling documentation, but have the following questions:
- Is customer paging with high $skip values particularly expensive for Azure Search?
- What can we do to reduce the likelihood of Azure Search throttling for paging scenarios?
- Should we add our own throttling to ensure a single customer’s searches doesn’t affect all other customers' searches? Does Azure Search have guidance on this?
Some more information about our service:
- Number of documents in index: ~950K
- Request volume: 1.3K paging requests in ~10 minutes. Peak of 125 requests per second, average of 6 requests per second
- Scale: standard SKU, 1 partition, 3 replicas (this is our secondary region, hence the smaller scale to save money)