3
votes

The documentation for boto3 and dynamodb paginators specify that NextToken should be returned when paging, and you would then include that token in the next query for StartingToken to resume a paging session (as would happen when accessing information via a RESTful API).

However, my testing shows that it doesn't return NextToken in the results, but rather LastEvaluatedKey. I was thinking I could use LastEvaluatedKey as the token, but that doesn't work?

paginator = client.get_paginator('scan')
page_iterator = paginator.paginate(TableName='test1', PaginationConfig={'PageSize': 1 , 'MaxItems': 5000,  'MaxSize': 1 })

    for page in page_iterator:
        print(page)
        break   

I would expect the page object returned from the page_iterator to include NextToken Key but it does not?

{'Items': [{'PK': {'S': '99'}, 'SK': {'S': '99'}, 'data': {'S': 'Test Item 99'}}], 'Count': 1, 'ScannedCount': 1, 'LastEvaluatedKey': {'PK': {'S': '99'}, 'SK': {'S': '99'}}, 'ResponseMetadata': {'RequestId': 'DUE559L8KVKVH8H7G0G2JH0LUNVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 27 May 2019 14:22:09 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '153', 'connection': 'keep-alive', 'x-amzn-requestid': 'DUE559L8KVKVH8H7G0G2JH0LUNVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '3759060959'}, 'RetryAttempts': 0}}

What am I missing?

UPDATE: Somehow related to this? How to use Boto3 pagination

1
why do you need token when you are anyway getting page_iterator to iterate over pages? ā€“ dDarkLORD
Because I am returning this page of data to an HTTP API client. Iā€™d also need to return that token so they could come back and get the next page on a subsequent request. ā€“ jr.

1 Answers

5
votes

There are a few ways to address this using the boto3 scan paginator.

The first option is to call build_full_result like so:

result = paginator.paginate(TableName="your_table", PaginationConfig={"MaxItems":10, "PageSize": 10}).build_full_result()

That returns a response with 10 items, and 'NextToken' is populated provided there are more than 10 items. This is probably the simplest way, you can just treat the MaxItems as your returned page size and if 'NextToken' is empty you are at the end of the scan.

I noticed if you don't specify a page size the results are the same, but the consumed capacity and 'ScannedCount' are higher.

Another way is to do the encoding of the 'StartingToken' using the TokenEncoder in botocore.paginate directly.

If the initial call to the paginator is like:

pagination_config = {
    "MaxItems": 5000,
    "PageSize": 10,
}

scan_iterator = scan_paginator.paginate(
    TableName="your_table_name",
    PaginationConfig=pagination_config
)

The paged results will be as the question describes. The first 10 results will be returned in the first page, and 'NextToken' isn't specified but 'LastEvaluatedKey' is.

To use it, encode the returned 'LastEvaluatedKey' as the 'ExclusiveStartKey' and pass that into as the 'StartingToken' in the pagination config.

from botocore.paginate import TokenEncoder
encoder = TokenEncoder()
for page in scan_iterator:
    if "LastEvaluatedKey" in page:
            encoded_token = encoder.encode({"ExclusiveStartKey": page["LastEvaluatedKey"]})

Then:

pagination_config = {
    "MaxItems": 500,
    "PageSize": 10,
    "StartingToken": encoded_token
}

The reason to encode the primary key as the 'ExclusiveStartKey' is that it's what the actual scan API expects. Essentially the paginators are encoding / decoding the 'LastEvaluatedKey' and 'ExclusiveStartKey' into the 'NextToken' and 'StartingToken' values. If you do a base64 decode of the 'NextToken' returned when doing build_full_result you'll see that it is also uses the 'ExclusiveStartKey'.