15
votes

I have a DynamoDb table with thousands of data. I am scanning the table using Scan function and I have applied "Between" FilterExpression. However , the query response only gives 3 records whereas it should return about 100 records.

I have created the Lambda function using Node js.

3
The most likely reason is that the filter expression is filtering out some values. In this case DynamoDB must still scan the entire table and page through the results. Each response will include only the items that match the filter condition, and a LastEvaluateKey that you must include on the next request to continue scanning. This is a bit unintuitive at first but it makes sense if you think about it a bit more.Mike Dinescu
To fetch/scan all items from AWS Dynamodb using Node.js, you can refer to: stackoverflow.com/questions/44589967/…Yuci

3 Answers

15
votes

The other common issue could be whether the scan is executed until LastEvaluatedKey is empty.

If you are already doing this and still not getting all the items, please show your code to look at it in detail.

If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation. The results also include the number of items exceeding the limit. A scan can result in no table data meeting the filter criteria.

If LastEvaluatedKey is empty, then the "last page" of results has been processed and there is no more data to be retrieved.

If LastEvaluatedKey is not empty, it does not necessarily mean that there is more data in the result set. The only way to know when you have reached the end of the result set is when LastEvaluatedKey is empty.

8
votes

Here's example code to get all results:

 Map<String, AttributeValue> lastKeyEvaluated = null;
    do {
        ScanRequest sr = new ScanRequest()
                .withTableName("tableName")
                .withProjectionExpression("id")
                .withExclusiveStartKey(lastKeyEvaluated);
        ScanResult result = client.scan(sr);
        for (Map<String, AttributeValue> item : result.getItems()) {
            System.out.println(item.get("id").getS());
        }
        lastKeyEvaluated = result.getLastEvaluatedKey();
    } while (lastKeyEvaluated != null);
1
votes

Using Node.js I'm actually using the Query to retrieve the items from the database. A single Query operation can retrieve a maximum of 1 MB of data. That's why I have created a recursive function to retrieving and concatenation data from the database until we receiving LastEvaluatedKey from the response. When we receiving LastEvaluatedKey as null, that means there are no more data. My function uses the index to get data from the database. Using the Query functions will work more faster and effectively than Scan.

Actually, getItemByGSI function has a lot of parameters for filtering and customization of the query, which can be useful. And for sure you can remove the parameters which are not nesses for your cases.

So getAllItemsByGSI function can be used to retrieve all data from the DynamoDB, and getItemByGSI can be used to use a single Query.

'use strict';
    
    const omitBy = require('lodash/omitBy');
    const isNil = require('lodash/isNil');
    const AWS = require('aws-sdk');
    
    const call = (action, params) => {
        return new Promise((resolve, reject) => {
            try {
                const dynamoDb = new AWS.DynamoDB.DocumentClient();
                resolve(dynamoDb[action](params).promise());
            } catch (error) {
                reject(error);
            }
        });
    };
    
    const getItemByGSI = ({
        TableName,
        IndexName,
        attribute,
        value,
        sortKey,
        sortValue,
        filter,
        filterValue,
        operator,
        filter1,
        filterValue1,
        LastEvaluatedKey,
        ScanIndexForward,
        Limit,
    }) => {
        return new Promise(async (resolve, reject) => {
            try {
                const params = {
                    TableName,
                    IndexName,
                    KeyConditionExpression: '#attrKey = :attrValue',
                    ExpressionAttributeValues: { ':attrValue': value },
                    ExpressionAttributeNames: { '#attrKey': attribute },
                    ExclusiveStartKey: LastEvaluatedKey,
                    Limit,
                    FilterExpression: null,
                };
                sortKey && sortValue
                    ? (params.KeyConditionExpression +=
                            ' and #sortKey = :sortValue' &&
                            (params.ExpressionAttributeNames['#sortKey'] = sortKey) &&
                            (params.ExpressionAttributeValues[':sortKey'] = sortValue))
                    : '';
                filter && filterValue
                    ? (params.FilterExpression = `#${filter} = :${filter}`) &&
                      (params.ExpressionAttributeNames[`#${filter}`] = filter) &&
                      (params.ExpressionAttributeValues[`:${filter}`] = filterValue)
                    : '';
                filter && filterValue && operator && filter1 && filterValue1
                    ? (params.FilterExpression += ` ${operator} #${filter1} = :${filter1}`) &&
                      (params.ExpressionAttributeNames[`#${filter1}`] = filter1) &&
                      (params.ExpressionAttributeValues[`:${filter1}`] = filterValue1)
                    : '';
                params = omitBy(params, isNil);
                if (ScanIndexForward === false)
                    params.ScanIndexForward = ScanIndexForward;
                const result = await call('query', params);
                resolve(result);
            } catch (error) {
                reject(error);
            }
        });
    };
    
    const getAllItemsByGSI = (data) => {
        return new Promise(async (resolve, reject) => {
            try {
                const finalData = [];
                const gettingData = await getItemByGSI(data);
                finalData = finalData.concat(gettingData.Items);
                if (gettingData.LastEvaluatedKey) {
                    const final2 = await getAllItemsByGSI({
                        ...data,
                        LastEvaluatedKey: gettingData.LastEvaluatedKey,
                    });
                    finalData = finalData.concat(final2);
                }
                resolve(finalData);
            } catch (err) {
                reject(err);
            }
        });
    };
    
    module.exports = {
        getItemByGSI,
        getAllItemsByGSI,
    };