3
votes

I am running simple query against my s3 bucket with CloudTrail logs. The bucket is big and after around 1 min and 45 seconds I get error

HIVE_CURSOR_ERROR: Please reduce your request rate.

Is there a way to limit request rate against my s3 bucket within Athena?

SELECT *
FROM default.cloudtrail_logs_cloudtraillog
WHERE eventname = 'DeleteUser' AND awsregion = 'us-east-1'
2
Can you tell me the size of each file in s3 path?Prabhakar Reddy
It is cloudtrail logs so they can range in Bytes to KBMin
Can you try converting these small files to atleast 124 MB for each file and retry the same query? As the number of files that are scanned are more in s3 hence you are getting this error.Prabhakar Reddy
Yea here it is CloudTrail is writing to the bucket automatically and I do not want to try to figure out a "fix" or workaround. I wonder if anybody else had this issue and what was the solution to the problem. In regards to your solution that might work but how this would be implemented ? Lambda and writing archive into separate bucket ?Min

2 Answers

2
votes

So I will summarize solutions suggested by AWS. None of them are great and I wonder why AWS would not throttle on their end and instead throw the error.

By default, S3 will scale automatically to support very high request rates. When your request rate is scaling, S3 automatically partitions your S3 bucket as needed to support higher request rates.However, sometimes it still errors out. So they suggest to wait (do not suggest time frame) to give S3 enough time to auto-partition your bucket based on the request rate it is receiving.

They also suggest:

1) Using S3distcp utility to combine small files into larger objects. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html

2) Partitioning https://docs.aws.amazon.com/athena/latest/ug/partitions.html

1
votes

I got the same answer from AWS support. Since I was doing a one-off analysis, I ended up writing a script to copy a small date range worth of logs to a separate bucket and using Athena to analyze the smaller dataset.