2
votes

I am trying to create a following architecture: a vpc with two subnets (one is public containing a NatGateway and an InternetGateway, and another one is private.

I start a fargate service in a private subnet and it fails with this error:

CannotPullContainerError: API error (500): Get https://XYZ.dkr.ecr.us-east-1.amazonaws.com/v2/: net/http: request cancelled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Here's my CloudFormation template (the service is intentionally commented out, and the ECR image url is scrambled):

Resources:
#Network resources: VPC 
  WorkflowVpc:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: "10.0.0.0/16"
      EnableDnsSupport: false
      Tags:
        - Key: Project
          Value: Workflow
#PublicSubnet
  WorkflowPublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: "10.0.0.0/24"
      VpcId: 
        Ref: WorkflowVpc
  WorkflowInternetGateway:
    Type: AWS::EC2::InternetGateway
  WorkflowVCPGatewayAttachment:
    DependsOn: 
      - WorkflowInternetGateway
      - WorkflowVpc
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      InternetGatewayId:
        Ref: WorkflowInternetGateway
      VpcId:
        Ref: WorkflowVpc
  WorkflowElasticIp:
    Type: AWS::EC2::EIP
    Properties:
      Domain: vpc
  WorkflowPublicSubnetRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: 
        Ref: WorkflowVpc
  PublicSubnetToRouteTable:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: WorkflowPublicSubnetRouteTable
      SubnetId: 
        Ref: WorkflowPublicSubnet
  WorkflowInternetRoute:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: WorkflowPublicSubnetRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: 
        Ref: WorkflowInternetGateway
  WorkflowNat:
    DependsOn: 
      - WorkflowVCPGatewayAttachment
      - WorkflowElasticIp
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: 
        Fn::GetAtt:
          - WorkflowElasticIp
          - AllocationId
      SubnetId:
        Ref: WorkflowPublicSubnet
#Private subnet          
  WorkflowPrivateSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: "10.0.1.0/24"
      VpcId: 
        Ref: WorkflowVpc
  WorkflowPrivateSubnetRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: 
        Ref: WorkflowVpc
  PrivateSubnetToRouteTable:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId:
        Ref: WorkflowPrivateSubnetRouteTable
      SubnetId: 
        Ref: WorkflowPrivateSubnet
  WorkflowNatRoute:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: WorkflowPrivateSubnetRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: 
        Ref: WorkflowNat
#Fargate:
  WorkflowFargateTask:
    Type: AWS::ECS::TaskDefinition
    Properties:
      RequiresCompatibilities: 
        - "FARGATE"
      Cpu: "256"
      Memory: "0.5GB"
      ContainerDefinitions:
        - Name: WorkflowFargateContainer
          Image: "XYZ.dkr.ecr.us-east-1.amazonaws.com/workflow:latest"
      NetworkMode: awsvpc
      ExecutionRoleArn: "arn:aws:iam::XXX:role/ecsTaskExecutionRole"

  WorkflowCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: WorkflowServiceCluster

#  WorkflowService:
#    DependsOn: 
#      - WorkflowNatRoute
#    Type: AWS::ECS::Service
#    Properties:
#      Cluster: 
#        Ref: WorkflowCluster
#      DesiredCount: 1
#      TaskDefinition:
#        Ref: WorkflowFargateTask
#      NetworkConfiguration:
#        AwsvpcConfiguration: 
#          AssignPublicIp: DISABLED
#          Subnets: 
#            - Ref: WorkflowPrivateSubnet
#      LaunchType: FARGATE

I also tried to set AssignPublicIp: ENABLED within the public subnet, and it works just fine, but it is not what I'm aiming for.

So, the questions that I have: is my template ok and is it the problem of Fargate/ECR?

Also, what would be the best way to debug such a behaviour? It seems that CloudWatch has no logs concerning this error...

1
The template looks plausible. Add an EC2 instance into the private subnet and see if that can connect to the internet.Steve E.
Thank you, Steve. I thought about it, but since it is not accessible from internet, I wouldn't be able to ssh into that instance. Any idea how I could test if it has internet access?Igor Deruga
Update: connected to my private ec2 instance using the bastion technique. yum update -y times out on the host resolution, so I have no access to internet. Will try to debug it now! Thanks a lot, @SteveE.!!!Igor Deruga
I built a VPC with your stack with EC2 instance in both subnets and had no issues connecting to the internet from either. However NAT does take a minute to become available. This may cause your FARGATE task to fail. I would suggest having two CF stacks, one for network and one ECS. Make sure NAT is available first.Steve E.

1 Answers

1
votes

Following Steve E's hints I've figured out that the internet access is present, the only problem is in this parameter for the VPC:

EnableDnsSupport: false

Naturally, when I tried to update linux packages, or ping google.com, it couldn't resolve the host names. Switching it to "true" resolved the problem.