0
votes

I'm hoping someone can help me with the following issue. I've tried searching around but cannot find a good solution to my problem.

I have VPC 10.0.0.0/16

Within the VPC I have divided it up into private and public subnets. I have 1 private and 1 public subnet per AZ.

So my subnets are as follows: AZ us-east-2a 10.0.1.0/24 - private 10.0.2.0/24 - public

AZ us-east-2b 10.0.3.0/24 - private 10.0.4.0/24 - public

AZ us-east-2c 10.0.5.0/24 - private 10.0.6.0/24 - public

All that's for redundancy. But for now I'm doing a test with just having a bastion in us-east-2a and I am expecting that it should be able to ssh into all the other ec2 instances in that VPC, however that is not happening, and that's the problem I am facing.

My bastion host is in us-east-2a in a public subnet that I've created. I am able to ssh into that successfully from my local machine.

If I attempt to ssh into an ec2 instance in the same subnet as my bastion host then it works, but for any other host in a different subnet it does not work, even though this is all within one VPC.

For testing purposes the security group for the ec2-instances that I am trying to ssh into from the bastion are wide open (I will lock this down once I figure out the issue):

Basically I am allowing all tcp traffic in from the world on any port.

In terms of my NACLs - I have a NACL for my public network (and have associated my public subnets with that) and a NACL for my private network (and have associated my private subnets with that).

The outgoing traffic from my public nacl allows all tcp traffic 0 - 65535

The inbound private NACL at this point is allowing all traffic and the same outbound. Again, I will sturdy that up, but as I was troubleshooting this issue I relaxed these rules to make sure there wasn't an issue there.

I have a public and private route table attached to my public and private subnets respectively.

The public route table has a destination route 0.0.0.0/0 to my IG and it also has a 10.0.0.0/16 local route which should allow access to any host in the subnet.

The private route table has a 10.0.0.0/16 route to the local interface and all other traffic (0.0.0.0/0) to the NAT gateway.

It just hangs here and eventually there is a timeout.
[root@ip-10-0-2-177 ec2-user]# ssh [email protected]
ssh: connect to host 10.0.1.242 port 22: Connection timed out
[root@ip-10-0-2-177 ec2-user]# ssh -vvvv [email protected]
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for *
debug2: resolving "10.0.1.242" port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to 10.0.1.242 [10.0.1.242] port 22.
debug1: connect to address 10.0.1.242 port 22: Connection timed out
ssh: connect to host 10.0.1.242 port 22: Connection timed out

I can ping this server though:
[root@ip-10-0-2-177 ec2-user]# ping 10.0.1.242
PING 10.0.1.242 (10.0.1.242) 56(84) bytes of data.
64 bytes from 10.0.1.242: icmp_seq=1 ttl=255 time=0.403 ms
64 bytes from 10.0.1.242: icmp_seq=2 ttl=255 time=0.461 ms
64 bytes from 10.0.1.242: icmp_seq=3 ttl=255 time=0.479 ms
64 bytes from 10.0.1.242: icmp_seq=4 ttl=255 time=0.439 ms
^C
--- 10.0.1.242 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3061ms
rtt min/avg/max/mdev = 0.403/0.445/0.479/0.035 ms

Any help would be greatly appreciated, as I have looked at everything I could think of and I'm not sure where the issue is.

1
I would revert your NACL changes back to the defaults and double-check that your private EC2 instance has a security group ingress rule that allows tcp/22 inbound from a security group that the bastion is in.jarmod
Thank you! Look like it was my public network NACL inbound rules that were too restrictive.Vishal Shah

1 Answers

1
votes

The fact that you can ping the instance but not SSH to it means that your Route Tables and general networking is set correctly.

That leaves:

  • Security Group
  • NACL

Since your Security Group is "wide open", it would not be differentiating between types of traffic (eg SSH vs Ping). Therefore, it is unlikely to be the problem.

In general, you should leave NACLs at their default value of "allow all" unless you have a very specific need (eg creating a DMZ).

Also, NACLs only apply to traffic entering/exiting a subnet. Given that target instances in the same subnet are working, but instances in other subnets aren't working, it again points to your NACLs as the cause of the problem.

Suggestion: Revert the NACLs to normal default settings.