I am trying to create ECS cluster and I have manually built VPC with 3 public and 3 private subnets. All 3 public subnets have IGW attached to them with 0.0.0.0/0 and all 3 private subnets have NAT Gateways attached in route tables with 0.0.0.0/0. Each of 3 NAT Gateways are in each public subnet respectively.
I have already created another ECS Cluster with the same CloudFormation template that I am trying to use now and everything worked fine.
I have compared settings between 1st and 2nd VPC (failing one) and all settings (IGW, NAT Gateway, Route Tables, NACLs, SG) are same of course IPs are adjusted to the IP of 2nd VPC. When I try to create ECS in 2nd VPC (failing one) EC2 instances in private subnets fail to connect to Amazon repository and subsequently the whole stack is rolled back because the signal from EC2 instances is never being sent to Auto Scaling Group.
Afterwards I have checked the system logs from EC2 instances and they are not able to install amazon agent. Here is excerpt from logs:
Starting cloud-init: Cloud-init v. 0.7.6 running 'modules:config' at Mon, 20 Aug 2018 06:38:04 +0000. Up 10.06 seconds.
Loaded plugins: priorities, update-motd, upgrade-helper
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Disable the repository, so yum won't use it by default. Yum will then
just ignore the repository until you permanently enable it again or use
--enablerepo for temporary usage:
yum-config-manager --disable <repoid>
4. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5001 milliseconds')
Aug 20 06:38:20 cloud-init[2116]: util.py[WARNING]: Package upgrade failed
Aug 20 06:38:20 cloud-init[2116]: cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one
Aug 20 06:38:20 cloud-init[2116]: util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_package_update_upgrade_install.pyc'>) failed
Generating SSH2 ED25519 host key: [ OK ]
Starting sshd: [ OK ]
ntpdate: Synchronizing with time server: [ OK ]
Starting ntpd: [ OK ]
Starting sendmail: [ OK ]
Starting sm-client: [ OK ]
Starting crond: [ OK ]
Starting cgconfig service: [ OK ]
Starting docker: .[ OK ]
Starting cloud-init: Cloud-init v. 0.7.6 running 'modules:final' at Mon, 20 Aug 2018 06:38:25 +0000. Up 29.91 seconds.
Loaded plugins: priorities, update-motd, upgrade-helper
Examining /var/tmp/yum-root-i85tqq/amazon-ssm-agent.rpm: amazon-ssm-agent-2.3.13.0-1.x86_64
Marking /var/tmp/yum-root-i85tqq/amazon-ssm-agent.rpm to be installed
Resolving Dependencies
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Disable the repository, so yum won't use it by default. Yum will then
just ignore the repository until you permanently enable it again or use
--enablerepo for temporary usage:
yum-config-manager --disable <repoid>
4. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5000 milliseconds')
Loaded plugins: priorities, update-motd, upgrade-helper
[ 53.291581] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 53.297948] Bridge firewalling registered
[ 53.304776] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[ 53.318481] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 53.510300] Initializing XFRM netlink socket
[ 53.515251] Netfilter messages via NETLINK v0.30.
[ 53.518920] ctnetlink v0.93: registering with nfnetlink.
[ 53.688086] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Disable the repository, so yum won't use it by default. Yum will then
just ignore the repository until you permanently enable it again or use
--enablerepo for temporary usage:
yum-config-manager --disable <repoid>
4. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5000 milliseconds')
Loaded plugins: priorities, update-motd, upgrade-helper
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Disable the repository, so yum won't use it by default. Yum will then
just ignore the repository until you permanently enable it again or use
--enablerepo for temporary usage:
yum-config-manager --disable <repoid>
4. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5001 milliseconds')
/var/lib/cloud/instance/scripts/part-001: line 9: /opt/aws/bin/cfn-init: No such file or directory
/var/lib/cloud/instance/scripts/part-001: line 10: /opt/aws/bin/cfn-signal: No such file or directory
Aug 20 06:39:13 cloud-init[2286]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [127]
Aug 20 06:39:13 cloud-init[2286]: cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Aug 20 06:39:13 cloud-init[2286]: util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
I have checked NACL, for Inbound and Outbound everything is set to 0.0.0.0/0 and ALLOW.
For the 1st VPC I am using ECS optimized AMI and t2.large (no issues whatsoever) and for 2nd c5.xlarge (causing issues).
What could be still causing EC2 to being unable to reach Amazon repository?
Edit
So later on I found out 2nd VPC has S3 Endpoint attached to it. After a little bit more research I found a nice post on LinkedIn stating:
The Amazon Linux repositories are hosted on S3 and because of this it's necessary to allow access to it in the S3 endpoint policy.
So when you fire up yum it uses the magic of local DNS trickery to route to the internal S3 endpoint
I went on to update my CloudFormation template and added additional policy to the LaunchConfiguration below, but that did not help:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": [
"arn:aws:s3:::repo.eu-central-1.amazonaws.com",
"arn:aws:s3:::repo.eu-central-1.amazonaws.com/*"
],
"Effect": "Allow"
}
]
}
And Endpoint Policy looks like this:
{
"Statement": [
{
"Action": "*",
"Effect": "Allow",
"Resource": "*",
"Principal": "*"
}
]
}