1
votes

Does GSI Overloading provide any performance benefits, e.g. by allowing cached partition keys to be more efficiently routed? Or is it mostly about preventing you from running out of GSIs? Or maybe opening up other query patterns that might not be so immediately obvious.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html

e.g. I you have a base table and you want to partition it so you can query a specific attribute (which becomes the PK of the GSI) over two dimensions, does it make any difference if you create 1 overloaded GSI, or 2 non-overloaded GSIs.

For an example of what I'm referring to see the attached image:

https://drive.google.com/file/d/1fsI50oUOFIx-CFp7zcYMij7KQc5hJGIa/view?usp=sharing

The base table has documents which can be in a published or draft state. Each document is owned by a single user. I want to be able to query by user to find:

  1. Published documents by date
  2. Draft documents by date

I'm asking in relation to the more recent DynamoDB best practice that implies that all applications only require one table. Some of the techniques being shown in this documentation show how a reasonably complex relational model can be squashed into 1 DynamoDB table and 2 GSIs and yet still support 10-15 query patterns.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-relational-modeling.html

I'm trying to understand why someone would go down this route as it seems incredibly complicated.

2
I just wrote a long Q&A which might help a bit stackoverflow.com/questions/55152296/…F_SO_K
In short, don't do it!F_SO_K

2 Answers

2
votes

The idea – in a nutshell – is to not have the overhead of doing joins on the database layer or having to go back to the database to effectively try to do the join on the application layer. By having the data sliced already in the format that your application requires, all you really need to do is basically do one select * from table where x = y call which returns multiple entities in one call (in your example that could be Users and Documents). This means that it will be extremely efficient and scalable on the db level. But also means that you'll be less flexible as you need to know the access patterns in advance and model your data accordingly.

See Rick Houlihan's excellent talk on this https://www.youtube.com/watch?v=HaEPXoXVf2k for why you'd want to do this.

I don't think it has any performance benefits, at least none that's not called out – which makes sense since it's the same query and storage engine.

That being said, I think there are some practical reasons for why you'd want to go with a single table as it allows you to keep your infrastructure somewhat simple: you don't have to keep track of metrics and/or provisioning settings for separate tables.

-1
votes

My opinion would be cost of storage and provisioned throughput.

Apart from that not sure with new limit of 20