2
votes

I am currently thinking about how to structure my DynamoDB table(s).

I have following data structure:

{
  UserId: string,
  Id: string,
  Date: string, //format: yyyy-mm-dd
  From: string,
  To: string,
  Spent: string
}

So my purpose is to store hours of work.

I will query by UserId and Date. So I get the hours of work for the User which is logged in. For example: Query for 2017-09-12.

Or: Hours of work between 2017-08-12 and 2017-09-30

Hash Key: UserId,

Sort Key: Id (I want to query by date, but Hash Key and Sort Key must be unique)

Secondary Local Index: Date

My Question: should I split the table into multiple tables separated by month? When yes: There isn't a way to query over multiple table. So I have to find out which tables are required by the request and query them individual and join the results together?

Edit: And the tables I have to create automatically on the flow? So I have to check if the table already exists and if it doesn't I'll create it in the request?

I am refering to the Amazon DynamoDB docs:

the applications might show uneven access pattern across all the items in the table where the latest customer data is more relevant and your application might access the latest items more frequently and as time passes these items are less accessed, eventually the older items are rarely accessed. If this is a known access pattern, you could take it into consideration when designing your table schema. Instead of storing all items in a single table, you could use multiple tables to store these items. For example, you could create tables to store monthly or weekly data. For the table storing data from the latest month or week, where data access rate is high, request higher throughput and for tables storing older data, you could dial down the throughput and save on resources.

Amazon DynamoDB Guidelines for Tables

If you need more informations, please say it! Thanks in advance!

1

1 Answers

2
votes

I will provide you one option how to implement this with a single table. Here I don't compare pros & cons between single and multiple tables (it's out of scope of my answer). I prefer to go with a single table at least due to: 1) price issue and 2) it's possible to reach your needs with a single table. Table structure:

{
    UserId: String // hash key
    Date: String // range key, format: yyyy-mm-dd
    YearAndMonth: String //format: yyyy-mm, optional field
    Events: List<Map<String, String>> // list of events, where single event - map with following keys: from, to, spent
    Version: Long // for optimistic locking, @DynamoDBVersionAttribute
    totalHours: Number // or maybe totalMinutes, optional field
}

With such approach you will embed all events for a specific day in a single item. So for each new pair 'From & To' you should add this event to Events list (if item by UserID & Date does not exist => you create new one, if exist => safely update Events field with optimistic locking). For calculating hours of work you either need to parse Events field or add one more field for table like totalHours (and update this value on each new event). YearAndMonth field is optional field (you could also skip it), it could simplify filtering if you need to calculate hours for all days in a specific month.