We are processing a huge file by splitting the file to multiple parts of 200 rows each (storing them in a S3 bucket and processing each file). Each part file has an ID (Partition Key) and the Timestamp is the Sort Key.
I'm looking to find the total count (across multiple IDs/part files) by different statuses (SUCCESS, FAILURE). For eg:
200000 records were successful (Status=Success) within the last 4 hours.
200 records were failed (Status=Failure) due to errorStatus "FAILURE :: Could not open JDBC Connection" within the past 4 hours
158 records were failed (Status=Failure) due to errorStatus "FAILURE :: Network failed" within the past 4 hours.
I'm able to get them by each bID separately. For eg.
aws dynamodb query --table-name abc1 --index-name abcGdx1 --projection-expression "TId" --key-condition-expression "BId = :bId and STimestamp between :sortkeyval1 and :sortkeyval2" --filter-expression "PStatus = :status and PStage = :stage" --expression-attribute-values "{\":bId\": {\"S\": \"c1234-5678-1000\"}, \":stage\": {\"S\": \"C_C\"}, \":status\": {\"S\": \"SUCCESS\"}, \":sortkeyval1\": {\"S\": \"2020-09-22T22:00:42.108-04:00\"}, \":sortkeyval2\": {\"S\": \"2020-09-23T18:52:55.724-04:00\"}}" --return-consumed-capacity TOTAL
Can you please help with an idea on how this can be achieved