0
votes

I have log-type data with no natural key. Amazon dynamodb requires a hash attribute in the table primary key, so I plan to use a uuid. The problem is it seems like I need to fix the hash value when querying but I of course want to query over all logs so can't specify a single uuid. Do I misunderstand this dynamodb query requirement?

2

2 Answers

0
votes

You do not misunderstand the requirement.

The only way to avoid a full table scan is by querying against a specific HashKey.

How do you want to query the data? Would it perhaps make sense to use a date(possible at hour resolution) as your hashkey and creating a local secondary index on UUID?

0
votes

If you want to optimize performance and throughput provisioning I would suggest finding a way to use the Hash Key in your query and then a Filter Expression to narrow the records according to your needs (where a < latitude < b and c < longitude < d).

See Specifying Conditions with Condition Expressions for more details.

If using a Hash Key in your query is not possible and you have to use Scan with a Conditional Expression, I would then suggest segmenting your tables by Date or Time following the suggested Time Series Data Best Practices as you mentioned that you need to query the data across time:

Instead of storing all items in a single table, you could use multiple tables to store these items. For example, you could create tables to store monthly or weekly data. For the table storing data from the latest month or week, where data access rate is high, request higher throughput and for tables storing older data, you could dial down the throughput and save on resources.

You can save on resources by storing "hot" items in one table with higher throughput settings, and "cold" items in another table with lower throughput settings. You can remove old items by simply deleting the tables. You can optionally backup these tables to other storage options such as Amazon Simple Storage Service (Amazon S3). Deleting an entire table is significantly more efficient than removing items one-by-one, which essentially doubles the write throughput as you do as many delete operations as put operations.