1
votes

I have an API that retrieves documents based on keywords that appear in document fields. I would like to paginate results so that I can return documents to a client sending a request, as well as allowing them to request more documents if they want. The query itself only takes a second or so in the browser when I am in the Azure Data Explorer, but it takes about a minute when I query using the Python DocumentDB library.

Looking at the Microsoft Cosmos DB REST API, it appears as if there are two tokens, x-ms_continuation and x-ms-max-item-count that are used.

It doesn't appear that putting these as entries in the options dictionary of document_client.QueryDocuments() does the trick.

In the GitHub repository, the Read() method references the options parameter:

    headers = base.GetHeaders(self,
                              initial_headers,
                              'get',
                              path,
                              id,
                              type,
                              options)
    # Read will use ReadEndpoint since it uses GET operation
    url_connection = self._global_endpoint_manager.ReadEndpoint
    result, self.last_response_headers = self.__Get(url_connection,
                                                    path,
                                                    headers)

Looking in base.py, where the file is located, I saw these two blocks of code

if options.get('continuation'):
    headers[http_constants.HttpHeaders.Continuation] = (
        options['continuation'])

if options.get('maxItemCount'):
    headers[http_constants.HttpHeaders.PageSize] = options['maxItemCount']

This would appear to correspond to the two parameters above. However, when I set them as options in the query ({'continuation':True,'maxItemCount':10}), nothing changes.

The final query looks like

client.QueryDocuments(collection_link, query, {'continuation':True,'maxItemCount':10})

I have also tried using a string instead of an int for maxItemCount.

What am I doing incorrectly here?

Edit: The headers are the same as the two from the documentation above, from http_constants.py:

# Our custom DocDB headers
Continuation = 'x-ms-continuation'
PageSize = 'x-ms-max-item-count'
3

3 Answers

1
votes

The way continuation token works is that when you query documents and there are more documents available matching that query, service returns you a marker (or a token) that you need to include in your next query. That will tell the service to fetch the documents from that marker and not the beginning.

So in your code, the very 1st query will have no continuation parameter (or null). When you get the result, you should find if or not a token is returned from the service. If no token is returned that means there's no more data available. However if a token is returned, you should include that in your query options in the 2nd query.

1
votes

It turns out that the query results needed to be handled from the results object itself, and the method _fetch_function(options) should be called:

q = client.QueryDocuments(collection_link, query, {'maxItemCount':10})
results_1 = q._fetch_function({'maxItemCount':10})
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount':10,'continuation':token})

The data is contained in results_[n][0] and header information returned from the call is returned in results_[n][1].

0
votes

You can also get the results in pages using fetch_next_block(). Note that: the user's code should not expose the continuation token

q = db_source._client.QueryDocuments(collection_link, query,  {'maxItemCount': 10, 'continuation': True})
results = q.fetch_next_block()

ref: https://github.com/Azure/azure-documentdb-python/issues/98