1
votes

I want to create a DynamoDB table that allows me to save notes from users.

The attributes I have:

  • user_id
  • note_id (uuid)
  • type
  • text

The main queries I will need:

  • Get all notes of a certain user
  • Get a specific note
  • Get all notes of a certain type (the less used query)

I know that in terms of performance and DynamoDB partitions note_id would be the right choice because they are unique and would be distributed equally over the partitions but on the other hand is much harder to get all notes of a user without scanning all items or using a GSI. And if they are unique I suppose it doesn't make any sense to have a sort key.

The other option would be to use user_id as partition key and note_id as sort key, but if I have certain users that are a much larger number of notes than others wouldn't that impact my performance?

Is it better to have a partition key unique (like note_id) to scale well with DynamoDB partitions and use GSIs to create my queries or to use instead a partition key for my main query (user_id)?

Thanks

3
Do your two secondary searches really happen independently of knowing the user_id? I found your question whilst in the middle of doing some complex key design of my own and it seems possible you are over-generalising the scenarios.Andy Dent

3 Answers

1
votes

Possibly the simplest and most cost-effective way would be a single table:

Table Structure

  • note_id (uuid) / hash key
  • user_id
  • type
  • text

Have two GSIs, one for "Get all notes of a certain user" and one for "Get all notes of a certain type (the less used query)":

GSI for "Get all notes of a certain user"

  • user_id / hash key
  • note_id (uuid) / range key
  • type
  • text

A little note on this - which of your queries is the most frequent: "Get all notes of a certain user" or "Get a specific note"? If it's the former, then you could swap the GSI keys for the table keys and vice-versa (if that makes sense - in essence, have your user_id + note_id as the key for your table and the note_id as the GSI key). This also depends upon how you structure your user_id - I suspect you've already picked up on; make sure your user_id is not sequential - make it a UUID or similar.

GSI for "Get all notes of a certain type (the less used query)"

  • type / hash key
  • note_id (uuid) / range key
  • user_id
  • text

Depending upon the cardinality of the type field, you'll need to test whether a GSI will actually be of benefit here or not.

If the GSI is of little benefit and you need more performance, another option would be to store the type with an array of note_id in a separate table altogether. Beware of the 400k item limit with this one and the fact that you'll need to perform another query to get the text of the note.


With this table structure and GSIs, you're able to make a single query for the information you're after, rather than making two if you have two tables.

Of course, you know your data best - it's best to start with what you think is best and then test it to ensure it meets what you're looking for. DynamoDB is priced by provisioned throughput + the amount of indexed data stored so creating "fat" indexes with many attributes projects, as above, if there is a lot of data then it could become more cost effective to perform two queries and store less indexed data.

1
votes

I would use user_id as your primary partition(hash) key and note_id as your primary range(sort) key.

You have already noted that in an ideal situation, each partition key is accessed with equal regularity to optimise performance see Design For Uniform Data Access Across Items In Your Tables. The use of user_id is perfectly fine as long as you have a good spread of users who regularly log in. Indeed AWS specifically encourage this option (see 'Choosing a Partition Key' table in the link above).

This approach will also make your application code much simpler than your alternative approach.

You then have a second choice which is whether to apply a Global Secondary Index for your get notes by type query. A GSI key, unlike a primary key, does not need to be unique (see AWS GSI guide, therefore I suggest you would simply use type as your GSI partition key without a range key.

The obvious plus side to using a GSI is a faster result when you perform the note type query. However you should be aware of the downsides also. A GSI has a separate throughput allowance than your table, so you need to provision this in addition to your table throughput (at extra cost). If you dont provision your GSI with enough read units it could end up slower than a scan on your table. If you dont provision enough write units, your table writes could be throttled, even if your table had enough write units.

Also, AWS warn that GSIs are updated asynchronously (usually within a fraction of a second but it can be longer). This means queries on your GSI might return the 'wrong' result if you have table writes and index reads very close together. If this was a problem you would need to handle it in your application code.

0
votes

I see this as 2 tables. User and notes with a GSI on the notes table. Not sure how else you could do it. Using userId as primary key and note_id as sort key requires that you can only retrieve elements when you know both the user_id and the note_id. With DynamoDB if your not scanning you have to satisfy all the elements in the primary key, so both the partition and and sort if there is one. Below is how I would do this.

Get all notes of a certain user

When a user creates a note I would add this to the users table in the users notes attribute. When you want to get all of a users notes then retrieve the user and access the array/list of note_ids stored there.

{ userId: xxx,
  notes: [ note_id_1,note_id_2,note_id_3]
}

Get a specific note

A notes table with node_id as the primary key would make that easy.

{
noteId: XXXX,
note: "sfsfsfsfsfsf",
type: "standard_note"
}

Get all notes of a certain type (the less used query) I would use a GSI on the notes table for this with the attributes of "note_type" and note_id projected onto it.

Update

You can pull this off with one table and a GSI (See the two answers below for how) but I would not do it. Your data model is so simple why make it more complicated than users and notes.