0
votes

I have a cloud Formation template for a AWS Batch POC with 6 resources.

  • 3 AWS::IAM::Role
  • 1 AWS::Batch::ComputeEnvironment
  • 1 AWS::Batch::JobQueue
  • 1 AWS::Batch::JobDefinition

The AWS::IAM::Role have the policy "arn:aws:iam::aws:policy/AdministratorAccess" (In order to avoid issues.)

The roles are used:

  • 1 into the AWS::Batch::ComputeEnvironment
  • 2 into the AWS::Batch::JobDefinition

But even with the policy "arn:aws:iam::aws:policy/AdministratorAccess" I get "CannotPullContainerError: Error response from daemon: Get https://********.dkr.ecr.eu-west-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" when I rin a job.

Disclainer: All is FARGATE (Compute enviroment and Job), not EC2

        AWSTemplateFormatVersion: '2010-09-09'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
        Description: Creates a POC AWS Batch environment.
        Parameters:
          Environment:
            Type: String
            Description: 'Environment Name'
            Default: TEST
          Subnets:
            Type: List<AWS::EC2::Subnet::Id>
            Description: 'List of Subnets to boot into'
          ImageName:
            Type: String
            Description: 'Name and tag of Process Container Image'
            Default: 'upload:6.0.0'

        Resources:
          BatchServiceRole:
            Type: 'AWS::IAM::Role'
            Properties:
              RoleName: !Join ['', ['Demo', BatchServiceRole]]
              AssumeRolePolicyDocument:
                Version: 2012-10-17
                Statement:
                  - Effect: 'Allow'
                    Principal:
                      Service: 'batch.amazonaws.com'
                    Action: 'sts:AssumeRole'
              ManagedPolicyArns:
                - 'arn:aws:iam::aws:policy/AdministratorAccess'
          BatchContainerRole:
            Type: 'AWS::IAM::Role'
            Properties:
              RoleName: !Join ['', ['Demo', BatchContainerRole]]
              AssumeRolePolicyDocument:
                Version: 2012-10-17
                Statement:
                  - 
                    Effect: 'Allow'
                    Principal:
                      Service:
                        - 'ecs-tasks.amazonaws.com'
                    Action: 
                      - 'sts:AssumeRole'
              ManagedPolicyArns:
                - 'arn:aws:iam::aws:policy/AdministratorAccess'
          BatchJobRole:
            Type: 'AWS::IAM::Role'
            Properties:
              RoleName: !Join ['', ['Demo', BatchJobRole]]
              AssumeRolePolicyDocument:
                Version: 2012-10-17
                Statement:
                  - Effect: 'Allow'
                    Principal:
                      Service: 'ecs-tasks.amazonaws.com'
                    Action: 'sts:AssumeRole'
              ManagedPolicyArns:
                - 'arn:aws:iam::aws:policy/AdministratorAccess'
          BatchCompute:
            Type: "AWS::Batch::ComputeEnvironment"
            Properties:
              ComputeEnvironmentName: DemoContentInput
              ComputeResources: 
                MaxvCpus: 256 
                SecurityGroupIds:
                  - sg-0b33333333333333
                Subnets: !Ref Subnets
                Type: FARGATE
              ServiceRole: !Ref BatchServiceRole
              State: ENABLED
              Type: Managed
          Queue:
            Type: "AWS::Batch::JobQueue"
            DependsOn: BatchCompute
            Properties:
              ComputeEnvironmentOrder: 
                - ComputeEnvironment: DemoContentInput 
                  Order: 1
              Priority: 1
              State: "ENABLED"
              JobQueueName: DemoContentInput
          ContentInputJob:
            Type: "AWS::Batch::JobDefinition"
            Properties:
              Type: Container
              ContainerProperties: 
                Command: 
                  - -v
                  - process
                  - new-file
                  - -o
                  - s3://contents/{content_id}/{content_id}.mp4
                Environment:
                  - Name: SECRETS
                    Value: !Join [ ':', [ '{{resolve:secretsmanager:common.secrets:SecretString:aws_access_key_id}}', '{{resolve:secretsmanager:common.secrets:SecretString:aws_secret_access_key}}' ] ] 
                  - Name: APPLICATION 
                    Value: upload
                  - Name: API_KEY 
                    Value: '{{resolve:secretsmanager:common.secrets:SecretString:fluzo.api_key}}'
                  - Name: CLIENT
                    Value: upload-container
                  - Name: ENVIRONMENT
                    Value: !Ref Environment
                  - Name: SETTINGS
                    Value: !Join [ ':', [ '{{resolve:secretsmanager:common.secrets:SecretString:aws_access_key_id}}', '{{resolve:secretsmanager:common.secrets:SecretString:aws_secret_access_key}}', 'upload-container' ] ] 
                ExecutionRoleArn: 'arn:aws:iam::**********:role/DemoBatchJobRole'
                Image: !Join ['', [!Ref 'AWS::AccountId','.dkr.ecr.', !Ref 'AWS::Region', '.amazonaws.com/', !Ref ImageName ] ] 
                JobRoleArn: !Ref BatchContainerRole
                ResourceRequirements:
                  - Type: VCPU
                    Value: 1
                  - Type: MEMORY
                    Value: 2048
              JobDefinitionName: DemoContentInput
              PlatformCapabilities:
                - FARGATE
              RetryStrategy: 
                Attempts: 1
              Timeout: 
                AttemptDurationSeconds: 600 

Into AWS::Batch::JobQueue:ContainerProperties:ExecutionRoleArn I harcoded the arn because if write !Ref BatchJobRole I get an error. But it's no my goal with this question.

The question is how to avoid "CannotPullContainerError: Error response from daemon: Get https://********.dkr.ecr.eu-west-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" when I run a Job.

1
I think connection timeout related to network issues, how about checking the routing, NAT GW, security group?Franxi Hidro
Are you !Ref Subnets public subnets or private ones? How is your VPC configured?Marcin

1 Answers

0
votes

It sounds like you can't reach the internet from inside your subnet.

Make sure:

  • There is an internet gateway device associated with your VPC (create one if there isn't -- even if you are just using nat-gateway for egress)
  • The route table that is associated with your subnet has a default route (0.0.0./0) to an internet gateway or nat-gateway with an attached elastic-ip.
  • An attached security group has rules allowing outbound internet traffic (0.0.0.0/0) for your ports and protocols. (e.g. 80/http, 443/https)
  • The network access control list (network ACL) that is associated with the subnet has rules allowing both outbound and inbound traffic to the internet.

References:

https://aws.amazon.com/premiumsupport/knowledge-center/ec2-connect-internet-gateway/