Dynamodb parallel scan using Table.scan api in Java

Question

I would appreciate help from anyone familiar with how DynamoDB work. I need to perform scan on a large DynamoDB table. I know that DynamoDBClient scan operation is limited to 1 MB size of returned data. Does the same restriction apply to Table.scan operation? The thing is that Table.scan operation returns output of type "ItemCollection<ScanOutcome>", while DynamoDBClient scan returns ScanResult output and it is not clear to me whether these operations work in a similar way or not.

I have checked this example: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ScanJavaDocumentAPI.html, but it doesn't contain any hints about using last returned key.

My questions are: Do I still need to make scan calls in a cycle until lastreturnedkey is null if I use Table.scan? If yes, how do I get last key? If not, how can I enforce pagination? Any links to code examples would be appreciated. I have spent some time googling for examples, but most of them are either using DynamoDBClient or DynamoDBMapper, while I need to use Table and Index objects instead.

Thanks!

You said yo have a very large table, but you are looking for something in particular (or a set), so you can start filtering your result (which is obvious I guess). If the same is not big enough: yes, you have to keep searching in the next batch(es). — x80486
I am not sure I understood your comment. I do have a filterexpression that filters out my scan results, but that doesn't guarantee that my results will never exceed 1Mb — Tofig Hasanov
So, you need to scan the next batch; you can do it in parallel by "playing" with Segments and/or TotalSegments; in that case the value of LastEvaluatedKey returned from the request must be used as the ExclusiveStartKey with the same segment ID in a subsequent scan operation. It's pretty much like SQL, but faster! — x80486
There is no "LastEvaluatedKey" parameter in Table.scan output type — Tofig Hasanov
why would not pages() work for you docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/… — kuhajeyan

Alexander Patrikalakis Alexander Patrikalakis · Accepted Answer · 2017-02-12T11:59:43

1

votes

If you iterate over the output of Table.scan(), the SDK will do pagination for you.

Dynamodb parallel scan using Table.scan api in Java

1 Answers