11
votes

Let's say I have a DynamoDB table such as

TableName: 'Items'
Key: {Hash: 'Id'}

Each item has a name and belongs to a customer, so I also have an Index on

{Hash: CustomerId, Range:Name}

Let's say I have this data:

Id    CustomerId    Name
1     18            Milk
2     42            Orange juice
3     42            Apple juice
4     42            Coffee
5     54            Tomato juice

Now, I want to query for all items for a particular customer and filter the results for partial names (essentially a search operation). For example, give me all items belonging to customer 42 that contains "juice" in its name (Orange juice and Apple juice are expected results).

If I query for CustomerId = '42' AND contains(Name, 'juice') I will get an error saying that KeyConditionExpression doesn't support contains. I can sort of understand this limitation since contains means having to scan all items (within the given hash key), but, well, you can query for all items with CustomerId = '42' which is also a full scan within that hash, so I'm not sure I understand this limitation. Things like begins_with are supported just as expected (which makes sense since it's easy to quickly return a subset from a sorted set).

Anyway, so I curse a little bit and say I'll just use a FilterExpression instead and live with the wasted RCU:s, resulting in a query with

KeyConditionExpression: CustomerId = '42'
FilterExpression: contains(Name, 'juice')

But now I get an error saying that I am not allowed to include primary key attributes in my FilterExpression ("use KeyConditionExpression instead!").

This leaves me in somewhat of a dilemma. I can't filter with contains in my KeyCondition and I can't filter on Name in my FilterExpression. Should I have to create a separate index only on CustomerId in order to implement my use-case or is there any other way to solve this...?

5
How frustrating, and interesting! Creating a whole new index would be expensive. You would be much better off duplicating the Name attribute (so one attribute called NameKey and one just called Name). Then you could use the FilterExpression on the non-key Name field like you were trying in your second attempt. Still rubbish, but less costly.F_SO_K
Thanks Stu. That did cross my mind, but it felt.. rubbish like you said. It might be the best compromise though. Would be cool to get some official aws response to this, I might copy the post to the aws forums as well.JHH
How did you manage to finally solve this problem because I am facing the same problem right nowSalman Hasrat Khan
It's been a while but if my memory serves me right, I ended up doing client side filtering since my data set per customer was fairly small. 😕JHH

5 Answers

11
votes

With DynamoDB, I think the best solution is to store the data in the shape you later intend to read.

If you find yourself requiring complex read queries you might have fallen into the trap of expecting DynamoDB to behave like an RDBMS, which it is not. Transform and shape your data on the write, keep the read simple.

2
votes

For stuff like this you should consider the concepts of composite keys and GSI overloading and re-design your table to fit your access patterns.

As per https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/

Use composite attributes. Try to combine more than one attribute to form a unique key, if that meets your access pattern. For example, consider an orders table with customerid+productid+countrycode as the partition key and order_date as the sort key.

So you could do something like designing your table to hold an index of customerid#name

2
votes

DynamoDB only allow begin_with for key conditions so contains is not supported, but for your case it is possible to arrange the rangeKey in hierarchical order like: CustomerId Name 18 Milk 42 juice.Orange 42 juice.Apple 42 Coffee 54 Tomato juice

So the query can be structured like KeyConditionExpression: CustomerId = '42' AND Name BEGINS_WITH 'juice'

1
votes

This query behave as like query in relational db

database.scan() 
     .filterExpression('begins_with(#name ,:name) or begins_with(#someno,:name)') 
    .expressionAttributeNames({ "#name": "name","#someno":"someno"}) 
    .expressionAttributeValues({ ":name" : data}) 
    .exec().promise(); 
0
votes

Just duplicate your attribute and apply the filter on it