How to scan DynamoDB Primary Key without causing full data reads internally?

Question

When I scan one of our tables, including all fields, with a DynamoDB limit of 1000, I get approx 480 items per scan because each item is large enough that the DynamoDB truncates the response based on the 1MB size limit.

However, when I scan the same table, and request only the Primary Key field using ProjectionExpression, I still only get approx 480 items, which suggests DynamoDB is unnecessarily loading full data from each item, only to discard that data except the Primary key, instead of just pulling keys straight from the primary index.

How do I scan only the Primary index without causing DynamoDB to read the full items, causing DynamoDB to use (and charge me for) unnecessary read capacity?

Have you tried using the Query API instead of Scan? From AWS documentation: The query operation finds items based on primary key values. You can query any table or secondary index that has a composite primary key (a partition key and a sort key). URL: docs.aws.amazon.com/amazondynamodb/latest/APIReference/… — krishna_mee2004
Is there a separate "primary index" in DynamoDB? I would have thought the primary index would be a clustered index with the pk + all data (whatever those might be called in NoSQL), since there would be no (?) point in an index containing only the primary keys. My best guess -- unverified speculation -- is that you need to create a Global Secondary Index with only the desired attributes, and then scan that index instead of the base table, so that DynamoDB has what you are expecting but without any surplus data to scan. — Michael - sqlbot
I think you're right @Michael-sqlbot. There's no easy way to just read primary keys, without creating a secondary index. The purpose for this is a library function which processes a table in batches, in a general way, for any of our tables. There's no guarantee a given table has a GSI, so that doesn't really help, and I guess we just have to painfully accept a literal full table scan. — JeremyTM

Renato Byrro Renato Byrro · Accepted Answer · 2018-06-19T01:54:24

You'll need to create a secondary index projecting only the attributes you need. Then scan this index and each item will consume only its projected size from the read capacity, not the original large size.

How to scan DynamoDB Primary Key without causing full data reads internally?

1 Answers