14
votes

It's unclear to me, after reading the docs, how many read capacity units are consumed during a scan operation with a filter in DynamoDB. For example, with this ruby request:

table.items.where(:MyAttribute => "Some Value").each do |item_data|
   # do something with the item_data
end

My understanding is that this will result in a table scan but DynamoDB will only return the items that I'm interested in. But if my table has 10000 items, and only 5 of those items are what gets through my filter, am I still being "charged" for a huge number of read capacity units?

The attribute I'm using for the filter is not a hash, range or secondary index. I've just had to add that attribute recently, and unexpectedly, which is why I'm not using a query instead.

1

1 Answers

17
votes

In short, you will be "charged" for the total amount of items scanned (not the total amount of items returned). Scan is, compared to query (as you already mentioned) an expensive operation.

Worth mentioning is the fact that when you invoke a scan on a table, it does not mean that the whole table will be scanned. If the size of the scanned items exceeds the limit of 1MB, the scan stops and you have to invoke it again to scan the next portion of the table.

This is taken from the official docs:

If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation. The results also include the number of items exceeding the limit. A scan can result in no table data meeting the filter criteria.

The filter is applied after the scan on the found items so it does not affect the throughput capacity at all.

If you are going to be performing these operations regularly, it may be worth considering an addition of some secondary indexes or optimizing the hash and range keys.