1
votes

I am new to AWS workspace, as of now we are using DynamoDB to feed our logs on daily bases for each job execution, And then each day we generating a summary report from all the data which was posted to dynamoDB on the previous day.

I am facing an issue while fetching the data from dynamoDB while generating the summary report. For fetching the data, I am using Java Client inside my scala class. The issue is that I am not able to retrieve all the data from dynamoDB for any filter condition. But while checking at DynamoDB UI, I can see a lot more no of records.

..using below code ..

    val client: AmazonDynamoDB = AmazonDynamoDBClientBuilder.standard.build

//Function that returns filter expression and ExpressionAttribute
    val (filterExpression, expressionAttributeValues) = getDynamoDBQuery(inputArgs)

    val scanRequest: ScanRequest = new ScanRequest()
      .withTableName("table_name")
      .withFilterExpression(filterExpression)
      .withExpressionAttributeValues(expressionAttributeValues)

    client.scan(scanRequest)

After a lot of analysis, it looks like that DynamoDB is taking a while for fetching all the data for any filter condition (when we scan the dataset). And Java client is not waiting while all the records are retrieved from the DynamoDB. Is there any workaround for this. Please help.

Thanks

3

3 Answers

4
votes

DynamoDB returns results in a paginated manner. For a given ScanRequest, the ScanResult contains getLastEvaluatedKey that should be passed through setExclusiveStartKey of the next ScanRequest to get the next page. You should loop through this until the getLastEvaluatedKey in a ScanResult is null.

BTW, I agree with the previous answer that DynamoDB may not be an ideal choice to store this kind of data from a cost perspective, but you are a better judge of the choice made!

2
votes

Dynamodb is not meant for the purpose which you are using for. Storage is not only costlier, but querying the data will also be costlier.

DynamoDb is meant for transaction key value store.

You can store it in Firehose, S3 and query with Athena. That is cheaper, scalable and good for analytical use.

Log --> Firehose --> S3 --> Athena

With regards to your question, DynamoDB will not return all the records when you request for it. It will return a set of records and will give the lastevaluatedkey.

More documentation on DynamoDB Scan.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html

Hope it helps.

0
votes

Thanks @Vikdor for your help .. I did the same way you suggested and it worked perfectly fine. Below is the code ..

var output = new StringBuilder
val client: AmazonDynamoDB = AmazonDynamoDBClientBuilder.standard.build

val (filterExpression, expressionAttributeValues) = getDynamoDBQuery(inputArgs)

var scanRequest: ScanRequest = new ScanRequest()
  .withTableName("watchman-jobs")
  .withFilterExpression(filterExpression)
  .withExpressionAttributeValues(expressionAttributeValues)

var flag: Boolean = false
var scanResult = client.scan(scanRequest)
var items : util.List[util.Map[String,AttributeValue]] = scanResult.getItems
var lastEvaluatedKey: util.Map[String, AttributeValue] = null
do {
  scanRequest = scanRequest.withExclusiveStartKey(lastEvaluatedKey)
  scanResult = client.scan(scanRequest)
  if(flag) items.addAll(scanResult.getItems)
  lastEvaluatedKey = scanResult.getLastEvaluatedKey
  flag = true
} while ( {
  lastEvaluatedKey != null
})

return items