Error creating application autoscaling target on AWS when using Terraform - Defining `scalable_resource` for custom `aws_appautoscaling_target`

Question

Goal

I'm implementing an auto-scaling solution for Kinesis data streams.

One possible solution, which I am following, is well documented in the aws-samples/aws-application-auto-scaling-kinesis repo. However, the sample code utilises a cloudformation yaml template. I wish to define the same using terraform.

Story so far

When trying to create a scaling target for my_custom_resource,

resource "aws_appautoscaling_target" "my_custom_resource" {
  resource_id        = "https://${aws_api_gateway_rest_api.my_api.id}.execute-api.${var.region}.amazonaws.com/prod/scalableTargetDimensions/${var.stream}"
  scalable_dimension = "custom-resource:ResourceType:Property"
  service_namespace  = "custom-resource"
}

All the attributes have been built by following the AWS Auto-Scaling docs

The same resource is created using CloudFormation in the linked AWS repo:

KinesisAutoScaling:
  Type: AWS::ApplicationAutoScaling::ScalableTarget
  DependsOn: LambdaScaler
  Properties:
    ResourceId: !Sub https://${MyApi}.execute-api.${AWS::Region}.amazonaws.com/prod/scalableTargetDimensions/${MyKinesisStream}
    ScalableDimension: 'custom-resource:ResourceType:Property'
    ServiceNamespace: custom-resource

^{^{Note: Irrelevant attributes omitted for brevity.}}

Problem

terraform apply produces the following error:

Error: Error creating application autoscaling target:  
ValidationException: Validation failed for resource:  
https://k5df89sd23.execute-api.us-west-1.amazonaws.com/prod/scalableTargetDimensions/my-test-stream,  
scalable dimension: custom-resource:ResourceType:Property.  
Reason: Scalable resource not found

  on application-autoscaling.tf line 9, in resource "aws_appautoscaling_target" "my_custom_resource":  
   9: resource "aws_appautoscaling_target" "my_custom_resource" {

What might be wrong in the terraform definition?

I am aware that Terraform supports CloudFormation Templates using aws_cloudformation_stack - a workaround I genuinely wish to avoid.

@matt-schuchard - do you think there isn't a flaw in my resource definition? I'm quite new with terraform, so can't be certain. — bPratik
Never mind; just double checked the documentation and your value for the resource_id is incorrect: terraform.io/docs/providers/aws/r/…. — Matt Schuchard
@MattSchuchard - On that page, if you scroll down to the attributes and follow the link mentioned in resource_id, that leads to AWS docs. There is a section for Custom Resources on there which is what I have followed to arrive at my resource_id. Is it still wrong? — bPratik

Jean-Benoit Harvey Jean-Benoit Harvey · Accepted Answer · 2020-12-15T19:30:11

DISCLAMER: I know the question has been asked a while ago, but I'm answering anyway just in case it might help someone else... It certainly would have helped me

Assuming you're trying to reproduce the content of the link you put in the question, then here's what I think could be wrong:

Certainly part of the answer : `aws_appautoscaling_target.my_custom_resource.resource_id`

If you have created an aws_api_gateway_deployment resource to reproduce the MyApi part of the aws-sample, then you're in luck!
you can use this :

resource "aws_appautoscaling_target" "my_custom_resource" {
  # ...
  resource_id = "${aws_api_gateway_deployment.gateway.invoke_url}/scalableTargetDimensions/${var.stream}"
  # ...

If you want details on how to get this ^ working, see below...

But keep in mind that this might not be enough!

Probably part of the solution too : the internal linking and permissions required for the creation of the `aws_appautoscaling_target`

always assuming you're following the same example and that you've implemented most of all of it with terraform...

also

DISCLAIMER: I'm not certain about the exact reason for the rest of this answer. I suspect that it has something to do with the requirements of the internal API of the AWS Auto Scaling service.

TL;DR: Everything needs to be connected and working before you can register the autoscaling target

This part :

of the README.md of this other project that covers the integration you're trying to achieve hints at this.

Long version; What I had to fix in my implementation:

Lambda response format
API Gateway settings and wiring
Correct IAM permissions
Lambda and APIGateway in good working order

1. Make sure you have the right lambda response

The return format of the return value of the lambda is crutial for the creation of the aws_appautoscaling_target.
It needs to be EXACTLY :

    returningJson = {
      "actualCapacity": float(actualCapacity),
      "desiredCapacity": float(desiredCapacity),
      "dimensionName": resourceName,
      "resourceName": resourceName,
      "scalableTargetDimensionId": resourceName,
      "scalingStatus": scalingStatus,
      "version": "MyVersion"
    }
    
    try:
        returningJson['failureReason'] = failureReason
    except:
        pass

(that's the way it's defined in the sample)...
In my implementation I had played around with it (before everything was deployed and done) thinking I could get more data out of the GET call, for metrics and monitoring...
Turns out that in the end when everything else was done, all I had to do was to restore the return function and it all connected successfully.

2. API Gateway settings and wiring

This part caused me trouble. I think that it to be done exactly right for the Auto Scaling API to connect and find the target so it can register it (that's what the resource you're trying to create does)
This API definition is a valid openapi.yaml definition.
I suggest putting that in a file like this one:

`openapi.yaml.template`

swagger: '2.0'
info:
  title: "${NAME}"
paths:
  '/scalableTargetDimensions/{scalableTargetDimensionId}':
    get:
      tags:
        - ScalableTargets
      x-tags:
        - tag: ScalableTargets
      security:
        - sigv4: []
      x-amazon-apigateway-any-method:
        produces:
          - application/json
        consumes:
          - application/json
      x-amazon-apigateway-integration:
        httpMethod: POST
        type: aws_proxy
        uri: ${INTEGRATION_URI}
        responses: {}
    patch:
      tags:
        - ScalableTargets
      x-tags:
        - tag: ScalableTargets
      security:
        - sigv4: []
      x-amazon-apigateway-any-method:
        security:
          - sigv4: []
        produces:
          - application/json
        consumes:
          - application/json
      x-amazon-apigateway-integration:
        httpMethod: POST
        type: aws_proxy
        uri: ${INTEGRATION_URI}
        responses: {}
securityDefinitions:
  sigv4:
    type: apiKey
    name: Authorization
    in: header
    x-amazon-apigateway-authtype: awsSigv4

And you can then use it in this way:

# API Gateway
resource "aws_api_gateway_rest_api" "gateway" {
  name = var.rest_api_name
  body = templatefile("${path.module}/openapi.yaml.template",
    {
      NAME            = var.rest_api_name,
      INTEGRATION_URI = var.integration_uri
    }
  )
}

resource "aws_api_gateway_deployment" "gateway" {
  depends_on = [
    aws_api_gateway_rest_api.gateway,
  ]

  lifecycle {
    create_before_destroy = true
  }

  rest_api_id = aws_api_gateway_rest_api.gateway.id
  stage_name  = var.stage_name
}

3. Correct IAM permissions

The permissions defined in the cloudformation templates and those that you can create with terraform are not an exact match and seem to need some tweeking for the integration to work... (I suspect this has to do with some AWS magic, but I find it overall relatively easy to transfer to terraform)
So here are the role and policies I ended up creating:

# Lambda
resource "aws_lambda_function" "lambda" {
  # ...
  role = aws_iam_role.kinesis_autoscaler_lambda_role.arn
  # ...
}
resource "aws_iam_role" "kinesis_autoscaler_lambda_role" {
  name               = "${var.env}-kinesis-scaler-lambda-role"
  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

resource "aws_lambda_permission" "kinesis_api" {
  statement_id  = "AllowKinesisAPIInvoke"
  function_name = aws_lambda_function.lambda.function_name
  action        = "lambda:InvokeFunction"
  principal     = "apigateway.amazonaws.com"
  source_arn    = "${aws_api_gateway_deployment.gateway.execution_arn}/GET/scalableTargetDimensions/{scalableTargetDimensionId}"
}

resource "aws_lambda_permission" "kinesis_api_patch" {
  statement_id  = "AllowKinesisAPIPatchInvoke"
  function_name = aws_lambda_function.lambda.function_name
  action        = "lambda:InvokeFunction"
  principal     = "apigateway.amazonaws.com"
  source_arn    = "${aws_api_gateway_deployment.gateway.execution_arn}/PATCH/scalableTargetDimensions/{scalableTargetDimensionId}"
}

# Permissions
resource "aws_iam_policy" "lambda_access_stream" {
  name = "${var.stream_name}-access-stream-policy"
                                      
  policy = <<POLICY                 
{                           
  "Version": "2012-10-17",                                          
  "Statement": [                                              
    {                                               
      "Sid": "KinesisConsumerAccess",     
      "Effect": "Allow",                                  
      "Action": [                                     
        "kinesis:DescribeStreamConsumer"               
      ],                                               
      "Resource": "${aws_kinesis_stream.stream.arn}/consumer/*:*"
    },                                        
    {                                                                                                                  
      "Sid": "KinesisStreamAccess",      
      "Effect": "Allow",                                                  
      "Action": [                                             
        "kinesis:DescribeLimits",                               
        "kinesis:DescribeStream",                                                                                     
        "kinesis:DescribeStreamConsumer",               
        "kinesis:DescribeStreamSummary",                                                                               
        "kinesis:UpdateShardCount"                      
      ],                                                      
      "Resource": "${aws_kinesis_stream.stream.arn}" 
    },                               
    {                                                              
      "Sid": "SSMParameterStoreGet",                                                                                
      "Effect": "Allow",                                  
      "Action": [    
        "ssm:GetParameter"           
      ],                                           
      "Resource": [                      
        "${aws_ssm_parameter.number_of_shards.arn}",    
      ]                                                           
    },                                             
    {                                                     
      "Sid": "SSMParameterStorePut",                      
      "Effect": "Allow",                                                                                               
      "Action": [                            
        "ssm:PutParameter"                                    
      ],                                
      "Resource": [                                                         
        "${aws_ssm_parameter.number_of_shards.arn}"
      ]                          
    }                                  
  ]                                 
}                                    
POLICY                                
}                  
# I ended up splitting the policies for `module` reasons...                              
resource "aws_iam_policy_attachment" "attach_lambda_stream_access" {
  name = "${aws_iam_role.kinesis_autoscaler_lambda_role.name}_attach_lambda_stream_access"
  roles = [                                         
    aws_iam_role.kinesis_autoscaler_lambda_role.name                  
  ]                                                       
  policy_arn = aws_iam_policy.lambda_access_stream.arn
}

This last one is an important one. It's the result of the final tweekings I have done to make the lambda work.

resource "aws_iam_policy" "lambda_access_scaling" {
  name = "${var.stream_name}-lambda-access-scaling-policy"

  policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "FindScalingPolicyARNAndAlarms",
      "Effect": "Allow",
      "Action": [
        "application-autoscaling:DescribeScalingPolicies",
        "cloudwatch:DescribeAlarms"
      ],
      "Resource": "*"
    },
    {
      "Sid": "UpdateAlarms",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:PutMetricAlarm",
        "cloudwatch:DeleteAlarms"
      ],
      "Resource": [
        "${aws_cloudwatch_metric_alarm.alarm_out.arn}",
        "${aws_cloudwatch_metric_alarm.alarm_in.arn}"
      ]
    }
  ]
}
POLICY
}

resource "aws_iam_policy_attachment" "attach_scaling_access" {
  name = "${aws_iam_role.kinesis_autoscaler_lambda_role.name}_attach_scaling_access"
  roles = [
    aws_iam_role.kinesis_autoscaler_lambda_role.name
  ]
  policy_arn = aws_iam_policy.lambda_access_scaling.arn
}

Also, you want to have the right permissions for the custom_appautoscaling...

# For reference:  
resource "aws_appautoscaling_target" "kinesis_stream" {
  min_capacity       = var.min_number_of_shard
  max_capacity       = var.max_number_of_shard
  resource_id        = "${aws_api_gateway_deployment.gateway.invoke_url}/scalableTargetDimensions/${var.stream_name}"           
  role_arn           = aws_iam_role.custom_appautoscaling_service_role.arn
  scalable_dimension = "custom-resource:ResourceType:Property"
  service_namespace  = "custom-resource"
                                                                   
  depends_on = [                                        
    aws_iam_policy_attachment.attach_base_policy,
  ]                                                     
                 
  lifecycle {                                        
    ignore_changes = [
      # This is because the "assume_role_policy" becomes the actual
      # Role "AWSServiceRoleForApplicationAutoScaling_CustomResource"                                               
      # at runtime and is always attemted to be recreated 
      role_arn,
    ]              
  }
}                         
          
# Actual policies:
 
resource "aws_iam_role" "custom_appautoscaling_service_role" {
  name               = "${var.stream_name}-assume-custom-resource"
  assume_role_policy = <<-EOF                   
  {        
    "Version": "2012-10-17",                            
    "Statement": [
      {                                      
        "Action": "sts:AssumeRole",
        "Principal": {
          "Service": "custom-resource.application-autoscaling.amazonaws.com"
        },                                       
        "Effect": "Allow",
        "Sid": ""  
      }
    ]                     
  }             
  EOF
}                           
                        
resource "aws_iam_policy" "base_policy" {
  name = "${var.stream_name}_base_policy"
                                 
  policy = <<POLICY
{                  
  "Version": "2012-10-17",                             
  "Statement": [                                     
    {  
      "Sid": "DescribeAlarms",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:DescribeAlarms"
      ],
      "Resource": "*"
    },                                                             
    {                                                   
      "Sid": "InvokeApiGateway",
      "Effect": "Allow",                                
      "Action": [
        "execute-api:Invoke*"                        
      ],
      "Resource": [
        "${aws_api_gateway_deployment.gateway.execution_arn}/scalableTargetDimensions/${var.stream_name}"
      ]                                                   
    }
  ]                
}
POLICY                    
}               
     
resource "aws_iam_policy_attachment" "attach_base_policy" {
  name = "${var.stream_name}_attach_base_policy"
  roles = [      
    aws_iam_role.custom_appautoscaling_service_role.name  
  ]                                
  policy_arn = aws_iam_policy.base_policy.arn
}                    
      
resource "aws_iam_policy" "alarms_modification" {
  name = "${var.stream_name}_alarms_modification"
                        
  policy = <<POLICY
{                                   
  "Version": "2012-10-17",       
  "Statement": [
    {              
      "Sid": "UpdateAlarms",                           
      "Effect": "Allow",                             
      "Action": [
        "cloudwatch:PutMetricAlarm",
        "cloudwatch:DeleteAlarms"
      ],
      "Resource": [
        "${aws_cloudwatch_metric_alarm.alarm_out.arn}",
        "${aws_cloudwatch_metric_alarm.alarm_in.arn}"
      ]                                                       
    }                                                   
  ]        
}                       
POLICY
}                                                      
 
resource "aws_iam_policy_attachment" "attach_alarms_modification" {                        
  name = "${var.stream_name}_attach_alarms_modification"
  roles = [                                            
    aws_iam_role.custom_appautoscaling_service_role.name
  ]
  policy_arn = aws_iam_policy.alarms_modification.arn
}

4. Lambda and APIGateway in good working order

Finally, if after all of this still doesn't work, you might want to troubleshoot your lambda and APIGateway...
I ended up using the apigateway-aws-proxy Event Template in the lambda console. Editing a bit the fields so that it looks like this:

{
  "body": {
    "desiredCapacity": "1"
  },
  "resource": "/{proxy+}",
  "path": "/scalableTargetDimensions/my-custom-stream",
  "httpMethod": "PATCH",
...
  "requestContext" {
    ...
    "path": "/<STAGE_NAME_SEE_API_GATEWAY_DEPLOYMENT_RESOURCE>/scalableTargetDimensions/my-custom-stream",
    "resourcePath": "/{proxy+}",
    "httpMethod": "PATCH",
}

Also for this part, I created one that had a base64encoded body (because it's the way it's passed by the APIGateway I guess?)...
Anyway I ended up tweeking the lambda a bit so it accept both, this way it's possible to know if the lambad actually works properly.

{
  "body": "eyJkZXNpcmVkQ2FwYWNpdHkiOiIyIn0=",
  "resource": "/{proxy+}",
  "path": "/scalableTargetDimensions/my-custom-stream",
  "httpMethod": "PATCH",
  "isBase64Encoded": true,
...

With this, you should be able to troubleshoot the lambda, making sure all permission is properly granted.
To troubleshoot the APIGateway, you could always use postman. It handles nicely the authentication to AWS so if you have credentials with adequate access to the APIGateway resource you've created, you should be able to do some GET and PATCH to manually trigger the APIGateway and test this part of the integration.