2
votes

I'm familiar with MySQL and am starting to use Amazon DynamoDB for a new project.

Assume I have a MySQL table like this:

CREATE TABLE foo (
  id CHAR(64) NOT NULL,
  scheduledDelivery DATETIME NOT NULL,
  -- ...other columns...
  PRIMARY KEY(id),
  INDEX schedIndex (scheduledDelivery)
);

Note the secondary Index schedIndex which is supposed to speed-up the following query (which is executed periodically):

SELECT *
  FROM foo
  WHERE scheduledDelivery <= NOW()
  ORDER BY scheduledDelivery ASC
  LIMIT 100;

That is: Take the 100 oldest items that are due to be delivered.

With DynamoDB I can use the id column as primary partition key.

However, I don't understand how I can avoid full-table scans in DynamoDB. When adding a secondary index I must always specify a "partition key". However, (in MySQL words) I see these problems:

  • the scheduledDelivery column is not unique, so it can't be used as a partition key itself AFAIK
  • adding id as unique partition key and using scheduledDelivery as "sort key" sounds like a (id, scheduledDelivery) secondary index to me, which makes that index pratically useless

I understand that MySQL and DynamoDB require different approaches, so what would be a appropriate solution in this case?

1

1 Answers

1
votes

It's not possible to avoid a full table scan with this kind of query.

However, you may be able to disguise it as a Query operation, which would allow you to sort the results (not possible with a Scan).

You must first create a GSI. Let's name it scheduled_delivery-index.

We will specify our index's partition key to be an attribute named fixed_val, and our sort key to be scheduled_delivery.

fixed_val will contain any value you want, but it must always be that value, and you must know it from the client side. For the sake of this example, let's say that fixed_val will always be 1.

GSI keys do not have to be unique, so don't worry if there are two duplicated scheduled_delivery values.

You would query the table like this:

var now = Date.now();

//...

{
   TableName: "foo",
   IndexName: "scheduled_delivery-index",
   ExpressionAttributeNames: {
       "#f": "fixed_value",
       "#d": "scheduled_delivery"
   },
   ExpressionAttributeValues: {
       ":f": 1,
       ":d": now
   },
   KeyConditionExpression: "#f = :f and #d <= :d",
   ScanIndexForward: true
}