Why can't I run my ECS task from AWS Lambda?

Question

I am using Amazon Web Services and trying to run an ECS Task Definition on a Cluster triggered from a Lambda.

When I run this task manually in the ECS console and chose all of the same options as I'm passing to run_task, it runs just fine. I see logs in Cloudwatch and the effects of the task (updaing a database) have happened as expected. But when I run the task from a Lambda it does not work, but also gives me no errors that I can see.

Here's the Lambda definition:

import boto3

def lambda_handler(event, context):
    print("howMuchSnowDoUpdate")
    client = boto3.client('ecs')
    response = client.run_task(
        cluster='HowMuchSnow',
        taskDefinition='HowMuchSnow:2',
        count=1,
        launchType='FARGATE',
        networkConfiguration={
            'awsvpcConfiguration': {
                'subnets': [
                    'subnet-ebce7c8c',
                ],
                'securityGroups': [
                    'sg-03bb63bf7b3389d42',
                ],
                'assignPublicIp': 'DISABLED'
            }
        },
    )
    print(response)

I have given the Lambda's IAM role the policy of ECSFull. Before I did I was getting an expected permission denied when running run_task. But once I added that policy, the Lambda runs just fine with no errors reported and this is the response that I get from that print(response) line:

{'tasks': [{'taskArn': 'arn:aws:ecs:us-east-1:221691463461:task/10b2473f-482d-4f75-ab43-3980f6995b17', 'clusterArn': 'arn:aws:ecs:us-east-1:221691463461:cluster/HowMuchSnow', 'taskDefinitionArn': 'arn:aws:ecs:us-east-1:221691463461:task-definition/HowMuchSnow:2', 'overrides': {'containerOverrides': [{'name': 'HowMuchSnow'}]}, 'lastStatus': 'PROVISIONING', 'desiredStatus': 'RUNNING', 'cpu': '256', 'memory': '512', 'containers': [{'containerArn': 'arn:aws:ecs:us-east-1:221691463461:container/9a76562b-1fef-457f-ae04-0f0eb4003e7b', 'taskArn': 'arn:aws:ecs:us-east-1:221691463461:task/10b2473f-482d-4f75-ab43-3980f6995b17', 'name': 'HowMuchSnow', 'lastStatus': 'PENDING', 'networkInterfaces': []}], 'version': 1, 'createdAt': datetime.datetime(2019, 6, 17, 14, 57, 29, 831000, tzinfo=tzlocal()), 'group': 'family:HowMuchSnow', 'launchType': 'FARGATE', 'platformVersion': '1.3.0', 'attachments': [{'id': 'e6ec4941-9e91-47d1-adff-d406f28b1931', 'type': 'ElasticNetworkInterface', 'status': 'PRECREATED', 'details': [{'name': 'subnetId', 'value': 'subnet-ebce7c8c'}]}]}], 'failures': [], 'ResponseMetadata': {'RequestId': '3a2506ef-9110-11e9-b57a-d7e334b6f5f7', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '3a2506ef-9110-11e9-b57a-d7e334b6f5f7', 'content-type': 'application/x-amz-json-1.1', 'content-length': '1026', 'date': 'Mon, 17 Jun 2019 14:57:29 GMT'}, 'RetryAttempts': 0}}

To my eyes this looks alright. But the task never actually runs. I do see a pending task in tasks list in the ECS console for my cluster briefly. But it runs not nearly as long as the actual task should run. It produces no logs in CloudWatch like it does when I run manually. I see no errors in the logs either.

One thing I will note is that I have to pick a VPC when running the task manually from the console but that's not a valid argument to boto3's ECS run_task function so I don't pass it.

Anyone know what might be going wrong or where I might look for information?

I have successfully created lambdas that kick off fargate tasks, your run_task command looks fine, i have a feeling it has something to do with permissions, are you setting any advanced options when you run the task manually via the console? what is the status of the task after it's completed? — JD D
You got the pending status back, so the task was created. If you take that taskArn and do describe-tasks call on it a couple of times after the fact, I am sure you can get more information about why it won't place. — Brett Green
JD D, The task statuses go from (current: PENDING, desired: RUINNING) to (current: DEPROVISIONING (failed to start), desired: STOPPED) to (current: STOPPED (failed to start), desired: STOPPED) — mmachenry
Brett Green, I have now noticed on the stopped tasks page that I get this error: "Status reason CannotPullContainerError: Error response from daemon: Get 221691463461.dkr.ecr.us-east-1.amazonaws.com/v2: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" I also found this github.com/aws/amazon-ecs-agent/issues/1654 I enabled AssignPublicIp. That fixed my intial problem and it runs the docker. I'm unclear as to why. I don't need a public IP. It's a cron job that's not serving anything. — mmachenry
Also, now my lambda, when running the test, terminate in an error status with "Task timed out after 3.00 seconds" despite doing everything I wanted it to. Not sure why that's the case. — mmachenry

openwonk openwonk · Accepted Answer · 2019-09-07T05:16:00

Here's what works for me.

When setting up Lambda:

Role must have ECS run task abilities
Don't specify a VPC in the Lambda function settings itself

Here's the Lambda code (replacing subnets, security groups, etc. for your own).

import boto3

client = boto3.client('ecs')

cluster_name = "demo-cluster"
task_definition = "demo-task:1"

def lambda_handler(event, context):
    try:

        response = client.run_task(
            cluster=cluster_name,
            launchType = 'FARGATE',
            taskDefinition=task_definition,
            count = 1,
            platformVersion='LATEST',
            networkConfiguration={
                'awsvpcConfiguration': {
                    'subnets': [
                        'subnet-0r6gh701', 
                        'subnet-a73d7c10'
                    ],
                    'securityGroups': [
                        "sg-54cb123f",
                    ],
                    'assignPublicIp': 'ENABLED'
                }
            })

        print(response)

        return {
            'statusCode': 200,
            'body': "OK"
        }
    except Exception as e:
        print(e)

        return {
            'statusCode': 500,
            'body': str(e)
        }

Why can't I run my ECS task from AWS Lambda?

2 Answers