DynamoDB Table.scan with and without pagination

Question

I am trying to understand the difference between two following code segments. One uses pages to get scan results, and the second one doesn't. I am wondering whether the second approach would work if the total number of items in the database is very large? AWS docs say that scan result is limited by 1 Mb. How does this affect version 2? Will it only get first 1 mb of results or would it still make database calls after each page?

Note that I am using table.scan API, which is different from DynamoDBClient.scan api. See http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/document/Table.html for API details.

Version 1 (using pages):

            ItemCollection<ScanOutcome> items = table.scan(spec);
            items.pages().forEach(page -> {
                for (Item item : page) {
                    response.add(item);
                }
            });

Version 2 (iterating over items without pages):

            ItemCollection<ScanOutcome> items = table.scan(spec);
            for (Item item : items) {
                    response.add(item);
            }

PKuhn PKuhn · Accepted Answer · 2017-01-17T08:22:02

Tofig is correct. There is no difference in between those two methods. The statement about the Scan result being limited to 1 MB is only true for the low-level API not for the Document API.
From the documentation of ItemCollection

A collection of Item's. An ItemCollection object maintains a cursor pointing to its current pages of data. Initially the cursor is positioned before the first page. The next method moves the cursor to the next row, and because it returns false when there are no more rows in the ItemCollection object, it can be used in a while loop to iterate through the collection. Network calls can be triggered when the collection is iterated across page boundaries.

DynamoDB Table.scan with and without pagination

2 Answers