23
votes

I don't get the concept of limits for query/scan in DynamoDb. According to the docs:

A single Query operation can retrieve a maximum of 1 MB of data.This limit applies before any FilterExpression is applied to the results.

Let's say I have 10k items, 250kb per item, all of them fit query params.

  1. If I run a simple query, I get only 4 items?
  2. If I use ProjectionExpression to retrieve only single attribute (1kb in size), will I get 1k items?
  3. If I only need to count items (select: 'COUNT'), will it count all items (10k)?
3

3 Answers

33
votes

If I run a simple query, I get only 4 items?

Yes

If I use ProjectionExpression to retrieve only single attribute (1kb in size), will I get 1k items?

No, filterexpressions and projectexpressions are applied after the query has completed. So you still get 4 items.

If I only need to count items (select: 'COUNT'), will it count all items (10k)?

No, still just 4

The thing that you are probably missing here is that you can still get all 10k results, or the 10k count, you just need to get the results in pages. Some details here. Basically when you complete your query, check the LastEvaluatedKey attribute, and if its not empty, get the next set of results. Repeat this until the attribute is empty and you know you have all the results.

EDIT: I should say some of the SDKs abstract this away for you. For example the Java SDK has query and queryPage, where query will go back to the server multiple times to get the full result set for you (i.e. in your case, give you the full 10k results).

2
votes

For any operation that returns items, you can request a subset of attributes to retrieve; however, doing so has no impact on the item size calculations. In addition, Query and Scan can return item counts instead of attribute values. Getting the count of items uses the same quantity of read capacity units and is subject to the same item size calculations. This is because DynamoDB has to read each item in order to increment the count.

Managing Throughput Settings on Provisioned Tables

2
votes

Great explanation by @f-so-k.

This is how I am handling the query.

import AWS from 'aws-sdk';

async function loopQuery(params) {
  let keepGoing = true;
  let result = null;
  while (keepGoing) {
    let newParams = params;
    if (result && result.LastEvaluatedKey) {
      newParams = {
        ...params,
        ExclusiveStartKey: result.LastEvaluatedKey,
      };
    }
    result = await AWS.query(newParams).promise();
    if (result.count > 0 || !result.LastEvaluatedKey) {
      keepGoing = false;
    }
  }
  return result;
}


const params = {
    TableName: user,
    IndexName: 'userOrder',
    KeyConditionExpression: 'un=:n',
    ExpressionAttributeValues: {
      ':n': {
        S: name,
      },
    },
    ConsistentRead: false,
    ReturnConsumedCapacity: 'NONE',
    ProjectionExpression: ALL,
  };

  const result = await loopQuery(params);

Edit:

import AWS from 'aws-sdk';

async function loopQuery(params) {
  let keepGoing = true;
  let result = null;
  let list = [];
  while (keepGoing) {
    let newParams = params;
    if (result && result.LastEvaluatedKey) {
      newParams = {
        ...params,
        ExclusiveStartKey: result.LastEvaluatedKey,
      };
    }
    result = await AWS.query(newParams).promise();
    if (result.count > 0 || !result.LastEvaluatedKey) {
      keepGoing = false;
      list = [...list, ...result]
    }
  }
  return list;
}


const params = {
    TableName: user,
    IndexName: 'userOrder',
    KeyConditionExpression: 'un=:n',
    ExpressionAttributeValues: {
      ':n': {
        S: name,
      },
    },
    ConsistentRead: false,
    ReturnConsumedCapacity: 'NONE',
    ProjectionExpression: ALL,
  };

  const result = await loopQuery(params);