2
votes

I am looking to store a log of user events. It is going to be a lot of entries so I thought DynamoDB would be good as everything else is hosted there.

I need to query these events in two ways, totalt of events for a user for a date (range) and occasionally all the events for a date.

I was thinking to store it in one table as user id (key), sequence number (key), date, time and duration.

Should it be multiple tables? How can this be done most efficient?

2
What is the sequence number? Is it supposed to be like an auto-increment field just for referencing an event, or what? Another question: would you like to query all events for a date across all users, or all events for a specific user?xtx
We have the sequence number to make each row unique and we provide this. We only need to query one user at a time.Gustaf

2 Answers

0
votes

For a small amount of data this structure is ok. Keep in mind that the sequence number (your range key) has to be provided by you. It seems a good idea to choose the date as a unix timestamp with a milliseconds accuracy as a sort key.

There is no need for extra tables. However your structure depends largely on the read write capacity that you want to achieve, and the data size.

Supposing your user_id is your partition key.

For every distinct partition key value, the total sizes of all table and index items cannot exceed 10 GB. A single partition can support a maximum of 3,000 read capacity units or 1,000 write capacity units.

You need to create your partition keys by taking into consideration these limitations. For example a very active user has many events thus you need more than 1000 write capacity units. Unfortunately you have choosen as a partition the user id.

In this case you are limited to 1000 write capacity units therefore you might have failures.

You need to have a different structure. For example a partition name like user_id_1 user_id_2 etc. Therefore a partition naming mechanism spreading the data to partitions according to your application's needs.

Check these links on dynamodb limitations.

Tables guidance, Partition distribution

0
votes

I would suggest the following structure for your events table:

  • user id -- hash key
  • event date/time (timestamp with milliseconds) -- range key
  • duration

Having event timestamp as a range key should be sufficient to provide uniqueness for an event (unless a user can have multiple events right in the same millisecond), so you don't need a sequence number.

Having such a schema, you can get all events for a user for a date by using simple query.

Unfortunately, DynamoDB do not support aggregate queries, so you can't get a total number of events for a user quickly (you would have to query all records and calculate total manually). So I would suggest creating a separate table for user events statistics like this:

  • user id -- hash key
  • date -- range key
  • events_cnt (total number of events for a user for a date)

So, after you add a new record into your events table, you have to increment events counter for the user in statistics table like shown below:

var dynamodbDoc = new AWS.DynamoDB.DocumentClient();
var params = {
  TableName : "user_events_stats",
  Key: {
      userId: "65716110-f4df-11e6-bc64-92361f002671" ,
      date:   "2017-02-17",
  },
  UpdateExpression: "SET #events_cnt = if_not_exists(#events_cnt, :zero) + :one",
  ExpressionAttributeNames: {
      "#events_cnt": "events_cnt",
  },  
  ExpressionAttributeValues: {
      ":one": 1,
      ":zero": 0,
  },  
};

dynamodbDoc.update(params, function(err, data) {

});