3
votes

I have a DocumentDB instance with about 4,000 documents. I just configured Azure Search to search and index it. This worked fine at first. Yesterday I updated the documents and indexed fields along with one UDF to index a complex field. Now the indexer is reporting that DocumentDB is reporting RequestRateTooLargeException. The docs on that error suggest throttling calls but it seems like Search would need to do that. Is there a workaround?

1

1 Answers

1
votes

Azure Search code uses DocumentDb client SDK, which retries internally with the appropriate timeout when it encounters RequestRateTooLarge error. However, this only works if there're no other clients using the same DocumentDb collection concurrently. Check if you have other concurrent users of the collection; if so, consider adding capacity to the collection.

This could also happen because, due to some other issue with the data, DocumentDb indexer isn't able to make forward progress - then it will retry on the same data and may potentially encounter the same data problem again, akin a poison message. If you observe that a specific document (or a small number of documents) cause indexing problem, you can choose to ignore them. I'm pasting an excerpt from the documentation we're about to publish:

Tolerating occasional indexing failures

By default, an Azure Search indexer stops indexing as soon as even as single document fails to be indexed. Depending on your scenario, you can choose to tolerate some failures (for example, if you repeatedly re-index your entire datasource). Azure Search provides two indexer parameters to fine- tune this behavior:

  • maxFailedItems: The number of items that can fail indexing before an indexer execution is considered as failure. Default is 0.
  • maxFailedItemsPerBatch: The number of items that can fail indexing in a single batch before an indexer execution is considered as failure. Default is 0.

You can change these values at any time by specifying one or both of these parameters when creating or updating your indexer:

PUT https://service.search.windows.net/indexers/myindexer?api-version=[api-version]
Content-Type: application/json
api-key: [admin key]
    {
        "dataSourceName" : "mydatasource",
        "targetIndexName" : "myindex",
        "parameters" : { "maxFailedItems" : 10, "maxFailedItemsPerBatch" : 5 }
    }

Even if you choose to tolerate some failures, information about which documents failed is returned by the Get Indexer Status API.