9
votes

Is it possible to specify an exclusive start key when querying a DynamoDB table via global secondary index?

I'm using the aws-java-sdk version 1.6.10 and executing queries with a QueryExpression and a DynamoDBMapper. Here's the gist of what I'm trying to do:

MappedItem key = new MappedItem();
item.setIndexedAttribute(attributeValue);

Map<String, AttributeValue> exclusiveStartKey = new HashMap<String, AttributeValue>();
exclusiveStartKey.put(MappedItem.INDEXED_ATTRIBUTE_NAME, new AttributeValue().withS(attributeValue));
exclusiveStartKey.put(MappedItem.TIMESTAMP, new AttributeValue().withN(startTimestamp.toString()));

DynamoDBQueryExpression<MappedItem> queryExpression = new DynamoDBQueryExpression<MappedItem>();
queryExpression.withIndexName(MappedItem.INDEX_NAME);
queryExpression.withConsistentRead(Boolean.FALSE);
queryExpression.withHashKeyValues(key);
queryExpression.setLimit(maxResults * 2);
queryExpression.setExclusiveStartKey(exclusiveStartKey);

This results in a 400 error saying that the specified start key is invalid. The TIMESTAMP is the range key for the table index and for the global secondary index, and the attribute value pair is valid (i.e. there is an item in the table with the values passed as the hash and range key for the index, and the attribute passed as the index is the hash key of the global secondary index).

Is there something I missed or is this not possible?

3

3 Answers

14
votes

Had the same issue and just got sorted. :) Too late to answer the question but hope someone will find helpful.

When you query or scan table with secondary indexes and pagination, you should include primary keys of the table and the index (as key), with last evaluated values (as attribute value) when you setting ExclusiveStartKey.

Just Sysout the LastEvaluatedKey from the query or scan result to see the format.

// let's just assume that we have a table to store details of products
Map<String, AttributeValue> exclusiveStartKey = new HashMap<String, AttributeValue>();
// primary key of the table
exclusiveStartKey.put("productId", new AttributeValue().withS("xxxx"));
exclusiveStartKey.put("produtSize", new AttributeValue().withS("XL"));
// primary key of the index
exclusiveStartKey.put("categoryId", new AttributeValue().withS("xx01"));
exclusiveStartKey.put("subCategoryId", new AttributeValue().withN("1"));
8
votes

Per an Amazonian, this is not possible: https://forums.aws.amazon.com/thread.jspa?threadID=146102&tstart=0

A workaround that worked for my use case, though, was to just specify a RangeKeyCondition greater than the last retrieved object's timestamp. Here's the idea:

Condition hashKeyCondition = new Condition();
hashKeyCondition.withComparisonOperator(ComparisonOperator.EQ).withAttributeValueList(new AttributeValue().withS(hashKeyAttributeValue));

Condition rangeKeyCondition = new Condition();
rangeKeyCondition.withComparisonOperator(ComparisonOperator.GT).withAttributeValueList(new AttributeValue().withN(timestamp.toString()));

Map<String, Condition> keyConditions = new HashMap<String, Condition>();
keyConditions.put(MappedItem.INDEXED_ATTRIBUTE_NAME, hashKeyCondition);
keyConditions.put(MappedItem.TIMESTAMP, rangeKeyCondition);


QueryRequest queryRequest = new QueryRequest();
queryRequest.withTableName(tableName);
queryRequest.withIndexName(MappedItem.INDEX_NAME);
queryRequest.withKeyConditions(keyConditions);

QueryResult result = amazonDynamoDBClient.query(queryRequest);

List<MappedItem> mappedItems = new ArrayList<MappedItem>();

for(Map<String, AttributeValue> item : result.getItems()) {
    MappedItem mappedItem = dynamoDBMapper.marshallIntoObject(MappedItem.class, item);
    mappedItems.add(mappedItem);
}

return mappedItems;

Note that the marshallIntoObject method is deprecated in favor of a protected method in the DynamoDBMapper class, but it's easy enough to write a marshaller were a future upgrade to break the mapping.

Not as elegant as using the mapper but it accomplishes the same thing.

6
votes

OK I'm super late to the party but I have figured out what is going on. This isn't a bug, it's working as it should, but I've never seen it in the documentation.

It turns out that in global secondary indexes, the primary indexes are used as "tiebreakers." That is, if two objects have the same GSI hash+sort keys, then the primary indexes are used to order them in the GSI. That means that when you query a GSI with an exclusive start key, you need both the GSI indexes and the primary indexes in order to start at the exact right place.

Maybe this will help out somebody. I know it stumped me for a while!