3
votes

I have a Lambda function that has a Python handler that submits a job to AWS Batch via boto3 client:

client = boto3.client('batch', 'us-east-1')

def handle_load(event, context):

    hasher = hashlib.sha1()
    hasher.update(str(time.time()).encode())
    job_name = f"job-{hasher.hexdigest()[:10]}"
    job_queue = os.environ.get("job_queue")
    job_definition = os.environ.get("job_definition")

    logger.info(f"Submitting job named '{job_name}' to queue '{job_queue}' "
                f"with definition '{job_definition}'")

    response = client.submit_job(
        jobName=job_name,
        jobQueue=job_queue,
        jobDefinition=job_definition,
    )

    logger.info(f"Submission successful, job ID: {response['jobId']}")

I can see this Lambda function submit the Batch job in CloudWatch logs but it always times out before the response comes back. I never see these jobs show up in the queue, so I'm not sure where things go after they are submitted, it seems that the Lambda is always timing out before the response comes back, I have little else to go on.

I have successfully added a job to the queue via AWS CLI, using the same queue and definition ARNs that are used in the Lambda's Python code. This job can be seen in the queue under the runnable tab (presumably the job will be started at some point in the near future).

The job submission with AWS CLI comes back instantly, so there must be something amiss on the Lambda configuration preventing the job submission. Perhaps I'm not using the correct role for the Lambda that submits the job, or have some other permissions that are amiss causing the timeout? The Lambda has permission for the batch:SubmitJob action allowed on all resources.

2
Have you adjusted your lambda's timeout time? Default is 3 seconds.Marcin
Yes, the timeout is 300 seconds.James Adams
So the actual call client.submit_job() is timing out? Is your lambda in VPC?Marcin
The second log message is never seen and the CloudWatch log shows a timeout after 300 seconds, so I assume the response never comes back as a result of timing out? The Lambda is in a VPC. Do you ask because I may need to further configure the Batch service to be on the same VPC? If so then should this be set on the compute environment, job queue, and/or job definition?James Adams
Yes. Lambda in VPC requires special treatment. I added more info in the answer.Marcin

2 Answers

4
votes

If an AWS Lambda function is not connected to a VPC, then by default it is connected to the Internet. This means it can call AWS API functions, which resides on the Internet.

If your Lambda function is configured to use a VPC, it will not have Internet access by default. This is good for connecting to other resources in a VPC, but if you wish to communicate with an AWS service, you'll either:

  • A NAT Gateway in a public subnet, with the Lambda function connected to a private subnet that has a Route Table rule that points to the NAT Gateway, or
  • A VPC endpoint that connects to the desired service. Unfortunately, AWS Batch does not have a VPC Endpoint.

So, if your Lambda function does not need to connect to other resources in the VPC, you can disconnect it and it should work. Otherwise, use a NAT Gateway.

3
votes

Based on the comments. Lambda in a VPC does not have access to internet. You need to setup internet gateway in public subnet and NAT gateway in private subnet with your lambda to be able to access AWS Batch endpoints. Alternatively have to use VPC interface endpoint for AWS Batch. From docs:

Connect your function to private subnets to access private resources. If your function needs internet access, use NAT. Connecting a function to a public subnet does not give it internet access or a public IP address.

Also you need to add permissions to your lambda's execution role to be able to create network interface in VPC:

  • ec2:CreateNetworkInterface

  • ec2:DescribeNetworkInterfaces

  • ec2:DeleteNetworkInterface