0
votes

I'm wondering what the best way to setup the keys for a table holding activity stream data. Each activity type will have different attributes (with some common ones). Here is an example of what some items will consist of:

A follow activity:

  • type
  • user_id
  • timestamp
  • follower_user_id
  • followee_user_id

A comment activity

  • type
  • user_id
  • timestamp
  • comment_id
  • commenter_user_id
  • commented_user_id

For displaying the stream I will be querying against the user_id and ordering by timestamp. There will also be other types of queries - for example I will occasionally need to query user_id AND type as well as stuff like comment_id, follower_user_id etc.

So my questions are:

  1. Should my primary key be a hash and range key using user_id and timestamp?
  2. Do I need secondary indexed for every other item - e.g. comment_id or will results return quick enough without the index? Secondary indexes are limited to 5 which wouldn't be enough for all the types of queries I will need to perform.
1

1 Answers

0
votes

I'd consider whether you could segment the data into two (or more) tables - allowing better use of your queries. Combine the two as (and if) needed, ie - your type becomes your table rather than a discriminator like you would do in SQL

If you don't separate the tables, then my answers would be

  1. Yes - I think that would be the best bet given that it seems like most of the time, that will be the way you are using it.
  2. No. But you do need to consider what the most frequent queries are and the performance considerations around it. Which ones need to be performant - and which ones are "good enough" good enough?

A combination of caching and asynchronous processing can allow a slow performing scan to be good enough - but it doesn't eliminate the requirement to have some local secondary indexes.