I am fairly new to NoSQL databases and I usually don't have an issue, but in my latest mobile app I am trying to use Amazon's DynamoDB and I have hit a wall.
Generally, I have no issues if the table contains unique, single entries. However, if the table contains a list of items associated with a specific attribute, I am not sure how to retrieve/query the data properly.
For example, below is a table. We will call it AWS_DDB_SESSIONS. It contains all the information about the user sessions (where SessionID is the hash key):
Assume that the table data is represented as follows: SessionID(hash), StartTime, StopTime, UserEmail
TableName: AWS_DDB_SESSIONS "AAADFR101", "01:10:00.000Z", "11:10:00.000Z", "[email protected]" "BBBLWL102", "12:20:00.000Z", "18:20:00.000Z", "[email protected]" "CCCUFE103", "03:00:00.000Z", "03:30:00.000Z", "[email protected]"
I can easily get a single entry if I specify the hash key value(SessionID). For example, if I wanted the first item in the list(AAADFR101), I would just make a "getItem" request with the specified SessionID:
Amazon's Java SDK PseudoCode: AmazonSDK.API.Table.getItem("AAADFR101"); The SQL equilavent is: select * from AWS_DDB_SESSIONS where SessionID='AAADFR101'; Expected Results: "AAADFR101", "01:10:00.000Z", "11:10:00.000Z", "[email protected]"
However, if I want to get all the sessions that are associated with "[email protected]", I have no idea what to do.
Amazon's Java SDK PseudoCode: // Create a Condition... Condition userEmailCondition = new Condition() .withComparisonOperator(ComparisonOperator.EQ.toString()) .withAttributeValueList(new AttributeValue().withS("[email protected]")); // Create the Map of Conditions... Map userEmailConditionMap = new HashMap(); userEmailConditionMap.put("userEmail", userEmailCondition); QueryRequest qRequest = new QueryRequest().withTableName("AWS_DDB_SESSIONS") .withKeyConditions(userEmailConditionMap); QueryResult = AmazonSDK.API.Table.query(qCondition); // NOTE: It errors here because the query doesn't contain the "SessionID" // value within the query(which is the primary key/hash for this table). The SQL equilavent is: select * from AWS_DDB_SESSIONS where UserID="[email protected]"; Expected Results: "AAADFR101", "01:10:00.000Z", "11:10:00.000Z", "[email protected]" "CCCUFE103", "03:00:00.000Z", "03:30:00.000Z", "[email protected]"
As mentioned above, the Java pseudo code doesn't work because Amazon's DynamoDB API requires that primary key (SessionID) when you perform a query. Even if you changed the DynamoDB table's schema to use a hash key (SessionID) and a range (UserEmail), it does you no good. Essentially, you run into the same problem.
I tried to come up with some alternative solutions, but I seem to come back to the same problem no matter how I approach it.
Alternative #1
The first alternative that I thought of, was to add another attribute which was a unique identifier. So you would change the schema:
Original Schema: SessionID(hash), StartTime, StopTime, UserEmail New Table Schema: RecordID(hash), SessionID, StartTime, StopTime, UserEmail
This creates a table similar to one that you would use in a Relational Database (RDB). However, it suffers from the original problem of not being able to query on UserEmail.
Alternative #2
After some online research, it was stated that you could split the table up into smaller tables based on various attributes (userEmail). So for every user you could create a table to store their session info in. Hence, our original sample would be split into tables with the following data:
TableName: [email protected] "BBBLWL102", "12:20:00.000Z", "18:20:00.000Z" TableName: [email protected] "AAADFR101", "01:10:00.000Z", "11:10:00.000Z" "CCCUFE103", "03:00:00.000Z", "03:30:00.000Z"
In some cases this approach is recommended because it would nudge you in the direction of removing the redundant data (UserEmail) and it may lower your costs because it scans less data while querying the tables. But it seems to create a scalability issue. If you have 5 people using your app, then you would only have 5 tables to worry about. However, if your user base grows to 1000, 100K, or 1 Million, then you would have a crazy amount of tables to keep track of. So I am not sure if this is the best approach.
Alternative #3
I tried to add some indexes for the attributes that I thought were important, but it didn't work as I had thought and still had the same original problem.
Alternative #4
Change the hash key from the SessionID, to the UserEmail, and then use the SessionID as the range key. But it seems that this may not work under certain circumstances such as when these values overlap, have different attribute fields, or if the attributes have the same value. For example, if John Doe had another Session with ID of "AAADFR101" then it would overwrite the existing entry instead of having 2 entries with similar information. You can overcome this challenge by adding a new attribute (RecordID) and use that as the primary key (hash value). However, you are back to the original problem would you try to query.
Ugh... My head is starting to hurt. Can someone point me in the right direction.
Thanks in advance,
-Randy