2
votes

At the moment I have a function to get all items from a DynamoDB table using the SCAN option. This is an expensive way to do it and I would prefer using the QUERY option. But looking at the docs there does not seem to be a simple way to retrieve all items using the QUERY option - it expects some sort of condition.

Example

var params = {
    TableName : "Movies",
    KeyConditionExpression: "#yr = :yyyy",
    ExpressionAttributeNames:{
        "#yr": "year"
    },
    ExpressionAttributeValues: {
        ":yyyy": 1985
    }
};

docClient.query(params, function(err, data) {
    if (err) {
        console.error("Unable to query. Error:", JSON.stringify(err, null, 2));
    } else {
        console.log("Query succeeded.");
        data.Items.forEach(function(item) {
            console.log(" -", item.year + ": " + item.title);
        });
    }
});

Expected

var params = {
    TableName : "Movies"  
};

docClient.query(params, function(err, data) {
    if (err) {
        console.error("Unable to query. Error:", JSON.stringify(err, null, 2));
    } else {
        console.log("Query succeeded.");
        data.Items.forEach(function(item) {
            console.log(" -", item.year + ": " + item.title);
        });
    }
});

Is it possible to retrieve all data from a table using QUERY? I thought of using BEGINS_WITH or such but all the primary keys are different/random and do not start with a specific character or phrase.

3
Let's say you could get all items using query, how would this be materially better than using scan?jarmod
@jarmod scan operations are generally slower and more expensive as the operation has to iterate through each item in the table to get the items I am requesting. This is the main reason as to why I do not want to use scan..Aleksandar Zoric
But you're getting all items regardless, unless you did not actually mean "all items". In fact, the query route would likely be slower than scan.jarmod
All items is correct. I am going by some research points like in the link below which states that a query operation is expected to be very fast and only marginally slower than a get operation. The scan operation on the other hand can take anywhere from 50-100ms to a few hours to complete and depends on the size of the table. I could be wrong by how I am understanding this.. techtraits.com/cloud/nosql/2012/06/28/…Aleksandar Zoric
They are different. Scan will take longer to find specific items, compared to a query that knows the partition key and sort key (or range of sort keys) in advance. But you're talking about "all items" so there is no known key in advance and hence, in a world where query could theoretically give you all items, it would not be any better than scan. That said, query does not provide a way to get all items, so it's a moot point.jarmod

3 Answers

2
votes

Technically, a query of all items in an Amazon DynamoDB table would return the same amount of data that a scan returns, so there should be no difference in cost.

The usual reduced efficiency of a scan operation is due to the fact that it has to read the whole table and then filters out values to provide the result you want, essentially adding the extra step of removing data from the result set. If you want to read the whole table without filtering, both scan and query have to retrieve all values and there is no additional filtering step.

1
votes

The only way to do via query would be to loop over every partition key individually.

I'd suggest you look at a secondary index built around your query which will be more efficient: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

0
votes

If you want to get all data you can use scan all data, but I recommend you to get data by limit and pagination because it can kill a lot of memory resources if you have millions of data at dynamodb. this approach for getting all your data

const AWS = require('aws-sdk');
const docClient = new AWS.DynamoDB.DocumentClient({
    apiVersion: '2012-08-10',
    region: 'ap-southeast-1' // put your region
});

exports.handler = async (event, context, callback) => {
    const tableName = event.params.querystring.tablename; 
     let params = { 
         TableName: tableName
     };

     let scanResults = [];
     let items;

     do {
         items = await docClient.scan(params).promise();
         items.Items.forEach((item) => scanResults.push(item));
         params.ExclusiveStartKey = items.LastEvaluatedKey;
     } while (typeof items.LastEvaluatedKey != "undefined");

     callback(null, scanResults);
   
            
};

But with the approach below, after you get data, you need to post the LastEvaluatedKey from the frontend to params and you can use it as ExclusiveStartKey.

const AWS = require('aws-sdk');
const docClient = new AWS.DynamoDB.DocumentClient({
    apiVersion: '2012-08-10',
    region: 'ap-southeast-1' // put your region
});

exports.handler = async (event, context, callback) => {
    const tableName = event.params.querystring.tablename; 
    
    let pageSize = event.params.querystring.pagesize;
    let lastItem = event.params.querystring.lastItem;
        try {
              const params = {
                TableName: tableName,
                Limit: pageSize,
              };
              if (lastItem) {
                params.ExclusiveStartKey = { id: lastItem};
              }
              const response = await docClient.scan(params).promise();
              return {
                 items: response.Items,
                 lastItem: response.LastEvaluatedKey
              };
        
            } catch (error) {
              throw error;
            }
};