4
votes

I am trying to understand the difference between two following code segments. One uses pages to get scan results, and the second one doesn't. I am wondering whether the second approach would work if the total number of items in the database is very large? AWS docs say that scan result is limited by 1 Mb. How does this affect version 2? Will it only get first 1 mb of results or would it still make database calls after each page?

Note that I am using table.scan API, which is different from DynamoDBClient.scan api. See http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/document/Table.html for API details.

Version 1 (using pages):

            ItemCollection<ScanOutcome> items = table.scan(spec);
            items.pages().forEach(page -> {
                for (Item item : page) {
                    response.add(item);
                }
            });

Version 2 (iterating over items without pages):

            ItemCollection<ScanOutcome> items = table.scan(spec);
            for (Item item : items) {
                    response.add(item);
            }
2

2 Answers

4
votes

Tofig is correct. There is no difference in between those two methods. The statement about the Scan result being limited to 1 MB is only true for the low-level API not for the Document API.
From the documentation of ItemCollection

A collection of Item's. An ItemCollection object maintains a cursor pointing to its current pages of data. Initially the cursor is positioned before the first page. The next method moves the cursor to the next row, and because it returns false when there are no more rows in the ItemCollection object, it can be used in a while loop to iterate through the collection. Network calls can be triggered when the collection is iterated across page boundaries.

1
votes

I have conducted an experiment where I have created 1000 records with 5kb size each. Then I've used version 2 to scan the table and still got all 1000 records, although total size is clearly > 1mb. Both versions scanned the whole table, so it seems there is no difference. It seems that ItemCollection handles pagination for you and there is no need to use pages, unless you want to control network calls and page size.