Migitate "Throttling: Rate exceeded" errors on terraform 0.11.7 apply to AWS

Question

Does anyone know how to mitigate throttling when using terraform 0.11.7

terraform-0.11.7 plan  -out proposed.plan -no-color 

Error: Error refreshing state: 1 error(s) occurred:
* module.ecs_alb.aws_alb_target_group.backend_internal_alb_target_group: 1 error(s) occurred:
* module.ecs_alb.aws_alb_target_group.backend_internal_alb_target_group[5]:
aws_alb_target_group.backend_internal_alb_target_group.5:
Error retrieving Target Group Attributes:
Throttling: Rate exceeded
status code: 400, request id: ...
make: *** [plan] Error 1

I run these from jenkins, so I loop with a try catch like so (we run terraform via make and the tf commands would be plan, apply, output. I'm waiting 10s between retries. I'll probably bump that up to something longer.

while (retry < retries) {
    try {
        makeError = null
        sh "make ${targets.join(' ')}"
        break
    } catch (Exception ex) {
        fileOperations([fileCopyOperation(excludes: '', flattenFiles: false, includes: '**log',
                renameFiles: false, sourceCaptureExpression: '',
                targetLocation: outputDir, targetNameExpression: '')])
        makeError = ex
        errorHandling.addResult('runMake', "path: ${path}, targets: ${targets}, retry: ${retry} of ${retries} failed with ${makeError}.  retrying")
        sleep time: waitSecs, unit: 'SECONDS'
    } finally {
        retry++
    }
}
if (makeError) {
    throw new Exception("Max retries reached (${retries})", makeError)
}

the error is because of aws limits. You need check the limits configured for the service you are trying to use ? \ — error404
@error404 hmm, we have 600ish target groups and a limit of 3000. So, the rate exceed seems linked to number of actions we are executing. We see this issue when creating dev environments - folks can create them as they like for any purpose - sometimes we have 10+ being created at the same time. This is usually when we hit this class of problem. So, it seems like too many api calls in too little time — Peter Kahn
I'm not seeing the actual terraform command you are running anywhere in your question. Are you setting the -parallelism setting? — Mark B
@MarkB sorry about that. updated the question with details. I don't see parallelism in the command or anything about parallel execution at the plan or module level. It seems like the parallel issue is that we've got multiple jenkins jobs running at the same time for different envs in the same AZ — Peter Kahn

Peter Kahn Peter Kahn · Accepted Answer · 2020-09-18T17:27:44

It seems that there are a few options (this come from commenters and coworkers)

change the retry logic from wait N sec per retry to: wait N sec, retry, double wait time, repeat (simple and worked for now)
change -parallelism from default of 10 to something lower (simple to do)
change the plan's module from creating so many task groups in a single module to using multiple modules and plans (this becomes an issue as the state changes shifting from one plan to another would force removal recreation or state surgery to avoid that problem)

Given the first one seems to be working, there seems no need to move to the others. If I later find it fails, then I'll experiment with tuning down parallelism just for this plan/module pair

Migitate "Throttling: Rate exceeded" errors on terraform 0.11.7 apply to AWS

1 Answers