1
votes

I have a dynamodb table which contains information of the status of different cron jobs.

Table attributes:

  • id [HashKey]
  • jobId [RangeKey]
  • status ('failed','pending', 'success')

I want to query the items based on the job status field.

Eg: list all jobs which are in pending state?

So I created the GSI as below.

GSI:

{
        IndexName: 'StatusIndex',
    
        KeySchema: [
          {
            AttributeName: 'status',
            KeyType: 'HASH',
          },
        ],
        Projection: {
          ProjectionType: 'ALL',
        },
      },

But the query on GSI is very slow when all the items contains same status value.

id jobId status
1 job1 pending
2 job2 pending
3 job3 pending
4 job4 pending

Is this because of not having range key?

1
How slow? What performance are you seeing and what are you expecting? Can you should us how you are querying the index? - Seth Geoghegan
And how many items in the GSI? - Mike Dinescu

1 Answers

0
votes

You might be better of with a Parallel Scan here. A Query does not have parallel functionality. If you're trying to get a very large amount of data in one Query, it will be slow. If you use a Parallel Scan, set the number of threads to match the number of MBs of data in your table to optimise the speed. This will cost you more RCUs than a Query.

Alternatively you can consider remodeling your data. You will need a way of running multiple Queries to access the desired data, and a way of running them in parallel from your client. One option you can consider is breaking the data down into time series.