3
votes

I've two AWS Cloudformation stacks, one for IAM roles and the second to create an AWS service and import the respective roles into it using Cloudformation.

When 10+ services are deployed the following error appears randomly on 1 or 2 of the services -

AWS::ECS::Service service Unable to assume role and validate the listeners configured on your load balancer. Please verify that the ECS service role being passed has the proper permissions.

If all the services are torn down and the services redployed to the ECS cluster, the error appears but for different services.

The AWS fix for this can be seen here

If the 1 or 2 broken services are torn down and redeployed the services deploy without issue. So the problem appears to only occur when many services are deployed at the same time - this indicates it may be an IAM propagation timing issue within Cloudformation.

I've tried adding depends on in the service definition -

"service" : {
"Type" : "AWS::ECS::Service",
"DependsOn" : [
    "taskdefinition",
    "ECSServiceRole"
],
"Properties" : {
    "Cluster" : { "Ref": "ECSCluster"},
    "Role" : {"Ref" : "ECSServiceRole"},
     etc...
 }
}

But this doesn't work.

As you can note, I've also removed the IAM import value for the ECSServiceRole and replaced it with an inline resource policy seen here -

"ECSServiceRole" : {
    "Type" : "AWS::IAM::Role",
    "Properties" : {
        "AssumeRolePolicyDocument" : {
            "Statement" : [
                {
                    "Sid": "",
                    "Effect" : "Allow",
                    "Principal" : {
                        "Service" : [
                            "ecs.amazonaws.com"
                        ]
                    },
                    "Action" : [
                        "sts:AssumeRole"
                    ]
                }
            ]
        },
        "Path" : "/",
        "Policies" : [
            {
                "PolicyName" : "ecs-service",
                "PolicyDocument" : {
                    "Statement" : [
                        {
                            "Effect" : "Allow",
                            "Action" : [
                                "ec2:Describe*",
                                "ec2:AuthorizeSecurityGroupIngress",
                                "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
                                "elasticloadbalancing:DeregisterTargets",
                                "elasticloadbalancing:Describe*",
                                "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
                                "elasticloadbalancing:RegisterTargets",
                                "sns:*"
                            ],
                            "Resource" : "*"
                        }
                    ]
                }
            }
        ]
    }
}

But again - the inline policy doesn't fix the issue either.

Any ideas or pointers would be much appreciated!

In reply to answer 1.

Thank you - I wasn't aware of this improvment.

Is this the correct way to associate the service linked role for ECS?

"ECSServiceRole": {
    "Type": "AWS::IAM::Role",
    "Properties": {
        "AssumeRolePolicyDocument": {
            "Statement": [
                {
                    "Sid": "",
                    "Effect": "Allow",
                    "Principal": {
                        "Service": [
                            "ecs.amazonaws.com"
                        ]
                    },
                    "Action": [
                        "sts:AssumeRole"
                    ]
                }
            ]
        },
        "Path": "/",
        "Policies": [
            {
                "PolicyName": "CreateServiceLinkedRoleForECS",
                "PolicyDocument": {
                    "Statement": [
                        {
                            "Effect": "Allow",
                            "Action": [
                                "iam:CreateServiceLinkedRole",
                                "iam:PutRolePolicy",
                                "iam:UpdateRoleDescription",
                                "iam:DeleteServiceLinkedRole",
                                "iam:GetServiceLinkedRoleDeletionStatus"
                            ],
                            "Resource": "arn:aws:iam::*:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS*",
                            "Condition": {
                                "StringLike": {
                                    "iam:AWSServiceName": "ecs.amazonaws.com"
                                }
                            }
                        }
                    ]
                }
            }
        ]
    }
}

Final Answer

After months of intermittent on-going issues with AWS regarding this matter AWS came back to say they were throttling us in the background, on the ELB. This is why the random and varied issues were appearing when deploying 3+ docker services via Cloudformation at the same time. The solution was nothing to do with IAM permissions, rather it was to increase the rate limit on the ELB via the "AWS Service Team".

2
I think I'm seeing the same intermittent issue as you, but using an ALB with target groups, so my error is slightly different: "Unable to assume role and validate the specified targetGroupArn. Please verify that the ECS service role being passed has the proper permissions. (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException; Request ID: ...)". Can I ask which ELB service rate limit you got increased?Lee Netherton

2 Answers

1
votes

So the fix was to continue using the two stack approach in Cloudformation, one with the IAM roles, which in turn were imported into the service layer stack. The fix was to add a depends on in the service definition for all of the other stack resources in the service layer script. By doing this it allows sufficient time for the IAM roles to be imported and executed by the service, thus this was a Cloudformation resource creation timing issue.

"service" : {
    "Type" : "AWS::ECS::Service",
    "DependsOn" : [
        "TaskDefinition",
        "EcsElasticLoadBalancer",
        "DnsRecord"
    ],
    "Properties" : {
      etc...
    }
}
0
votes

UPDATE: As of July 19th 2018, it is now possible to create a IAM Service-Linked Roles using CloudFormation https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-iam-servicelinkedrole.html.

   EcsServiceLinkedRole:
    Type: "AWS::IAM::ServiceLinkedRole"
    Properties:
      AWSServiceName: "ecs.amazonaws.com"
      Description: "Role to enable Amazon ECS to manage your cluster."

OLD ANSWER: Creating your own ECSServiceRole is no longer required. By not specifying a role for your service, AWS will default on using the ECS Service-Linked role. If your AWS account is recent enough, or you have already created a cluster via the console you don't have to do anything for this to work. If not, run the following command to create the role: aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com.