1
votes

Coming from a SQL background, trying to undestand NoSQL particularly DynamoDB options. Given this schema:

{
    "publist": [{
            "Author": "John Scalzi",
            "Title": "Old Man's War",
            "Publisher": "Tor Books",
            "Tags": [
                "DeepSpace",
                "SciFi"
            ]
        },
        {
            "Author": "Ursula Le Guin",
            "Title": "Wizard of Earthsea",
            "Publisher": "Mifflin Harcourt",
            "Tags": [
                "MustRead",
                "Fantasy"
            ]
        },
        {
            "Author": "Cory Doctorow",
            "Title": "Little Brother",
            "Publisher": "Doherty"
        }
    ]
}

I could have the main table have Author/Title as hash/range keys. A global secondary index could be Publisher/Title. What are the best practices here. How can I get a list of all Authors for a publisher without a total table scan? Cant have a secondary index because Publisher/Author is not unique! Also what are my options if I want all the titles that have a tag of DeepSpace?

EDIT: See RPM & Vikdor answers below. GSI need not be unique, so Publisher/Author is possible. But question remains: is there any workaround for getting all authors by tag, without full table scan?

3

3 Answers

1
votes

Cant have a secondary index because Publisher/Author is not unique!

Sure you can, just make sure your Publisher/Title index has Author as a projection - you can then do a query by publisher and just iterate over the results and collect the authors.

When you set up your indexes, you can choose which attributes are projected into the index. Having a Publisher or Publisher/Title key doesn't mean you can only view the Publisher or Publisher and Title, it means you can only query by Publisher or Title, so if you have all attributes or the Author attribute projected into your index, you can get a list of authors by publisher using a query and not a full table scan.

1
votes

Cant have a secondary index because Publisher/Author is not unique!

The (hash primary key, range primary key) tuple need not be unique for defining a Global Secondary Index. This is only a requirement for the Table level key definitions, i.e. the table cannot have multiple rows with the same values of (hash primary key, range primary key) tuple.

How can I get a list of all Authors for a publisher without a total table scan

You define a GSI on Publisher (Hash PK), Author (Range PK) and use DynamoDB query on the GSI with the Publisher attribute set as the Hash Key Value.

Unlike in SQL where it is possible to create non-clustered indexes on arbitrary columns based on the retrieval patterns, in DynamoDB, as the number of Local Secondary Indexes and Global Secondary Indexes are limited per table, it is important to list down the use cases of retrieving data before identifying the Hash Primary Key and Range Primary Key for a table and leverage Local Secondary Indexes as much as possible, as they use the table's read & write capacity and are strongly consistent (you can choose to run eventually-consistent queries too on LSIs to save capacity). GSIs need their own read & write capacity and are eventually-consistent.

0
votes

Unfortunately this is not supported currently in DynamoDB. DDB does not provide the capability to query on nested documents alike MongoDB.

In this situation consider modelling data differently and put the nested document in a separate table.

hope this will help.

Cheers,