13
votes

According to DynamoDB doc: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html

"You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table."

But according to my experience you always have to do the opposite thing due to partition key design.

Let's consider the next situation. We have several user roles, for example, "admin", "manager", "worker". Usual workflow of an admin is to CRUD manager data, where read operation is to get not one manager but all manager list. The same is for the manager - he CRUDs worker data. We have only two scenarios of key usage for both cases:

  • get a list of all items (item key doesn't matter)
  • work with a particular item using its full key.

Naturally we should have uniformly distributed partition key (as the doc emphasises) so we can't select user role for it and should use user id. Since we already have as partition key some random id, we don't need sort key at all since it simply doesn't work - we already access exectly one user by only using the partition key part. At this point we realize that user id is working like a charm for CUD operations but for every R operation we need to scan all the table and then filter the result by user role which is ineffective. How can this be improved? Very naturally - let's just have own table for each user type! Then we will scan for manager list from admin API and for worker list from the manager one.

I use DynamoDB almost for a year and still can't get it. For me the reality is that for real life scenarios sort key is something that you can never use (the only real case for it I had was to access items like "agreements" that belong to the two users of different types the same time, so the primary key was { partion: "managerId", sort: "userId" } and secondary global index was { partition: "userId", sort: "managerId" } so I could effectively query for all particualar manager agreement list or all particular user agreement list providing only corresponding manger or user id for the query. The approach is discussed in doc here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html).

I feel that I don't understand the concept at all. What can be an effective way of key schema for provided example to use only one DynamoDB table for both user types?

2
I find the statement "You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table." to be extremely suspect. That sounds like an extreme over-generalization of NoSQL to me. I would NOT try to make that a goal of your application design. Use DynamoDB however it works best for your application, given the type of queries that you will need to perform.Mark B
@MarkB I've found this article showing how do they suggest to use one table with a number technics, but I will really need a lot of time to understand what they are doing: docs.aws.amazon.com/amazondynamodb/latest/developerguide/…Arsenii Fomin
I second Mark B’s comment. Take that with a huge grain of salt. I think it’s a gross overgeneralization and the reality in the fief is far from it. In many cases it becomes a really bad idea to store everything in one tableMike Dinescu
I suggest the statement from AWS that the majority of NOSQL storage designs should have one table is complete and utter non-sense. In answer to your question, you would use a graph-node schema for a single table (docs.aws.amazon.com/amazondynamodb/latest/developerguide/…). However, adopting this design would impact your application code greatly and would likely lead to poor alignment between your storage and business logic code.F_SO_K
this talk - youtube.com/watch?v=HaEPXoXVf2k from re:invent 2018 might help understand the single table design philosophy.Deepak Rao

2 Answers

1
votes

It sounds like what you need in this case is a Global Secondary Index (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html) where the partition key is the user role. That way, you can query all users with a particular role through that UserRoleIndex and, with the help of a sort key on the user ID, single out one particular user within that role.

Alternatively, if you are starting from scratch with a new table, you might not even need an index (unless you don't know the role of a user when you delete them). You can use a "composite primary key" (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey) where the partition key and the sort key would be the same as in the index I am suggesting above.

Using the same notation that you used in your question, I would recommend { partition: "userRole", sort: "userId" }.

DynamoDB can be hard to understand sometimes and there definitively are cases where a traditional SQL database makes more sense. This video from AWS re:Invent 2018 is great to understand the difference between the two: https://www.youtube.com/watch?v=HaEPXoXVf2k&feature=youtu.be.

In your case, though, it looks like you have a very clear access pattern, so DDB would work for you.

0
votes

you can have a schema like

user_id, role, <other columns>

where

  • user_id = hash-key
  • role = GSI hash-key

This way, you can read and get all managers' list by querying the GSI

With GSI, DynamoDb creates another table and maintains it ,so you don't need to maintain multiple tables.

let me know if you have any questions