135
votes

There are 4 scenarios in AWS VPC configure. But let's look at these two:

  • Scenario 1: 1 public subnet.
  • Scenario 2: 1 public subnet and 1 private subnet.

Since any instance launched in public subnet does not have EIP (unless it's assigned), it is already not addressable from the Internet. Then:

  • Why is there a need for private subnet?
  • What exactly are the differences between private and public subnets?
4
The private subnet, even after assigning a public IP to machines within, is still inaccessible from the public internet. I create VPC setups for example with a web server in a public subnet and my database backend in the private subnet. I can connect with customer gateway to access machines on both private and public subnet.user602525

4 Answers

247
votes

Update: in late December, 2015, AWS announced a new feature, a Managed NAT Gateway for VPC. This optional service provides an alternative mechanism for VPC instances in a private subnet to access the Internet, where previously, the common solution was an EC2 instance on a public subnet within the VPC, functioning as a "NAT instance," providing network address translation (technically, port address translation) for instances in other, private subnets, allowing those machines to use the NAT instance's public IP address for their outbound Internet access.

The new managed NAT service does not fundamentally change the applicability of the following information, but this option is not addressed in the content that follows. A NAT instance can still be used as described, or the Managed NAT Gateway service can be provisioned, instead. An expanded version of this answer integrating more information about NAT Gateway and how it compares to a NAT instance will be forthcoming, as these are both relevant to the private/public subnet paradigm in VPC.

Note that the Internet Gateway and NAT Gateway are two different features. All VPC configurations with Internet access will have an Internet Gateway virtual object.


To understand the distinction between "private" and "public" subnets in Amazon VPC requires an understanding of how IP routing and network address translation (NAT) work in general, and how it they are specifically implemented in VPC.

The core differentiation between a public and private subnet in VPC is defined by what that subnet's default route is, in the VPC routing tables.

This configuration, in turn, dictates the validity of using, or not using, public IP addresses on instances on that particular subnet.

Each subnet has exactly one default route, which can be only one of two things:

  • the VPC's "Internet Gateway" object, in the case of a "public" subnet, or
  • a NAT device -- that is, either a NAT Gateway or an EC2 instance, performing the "NAT instance" role, in the case of a "private" subnet.

The Internet Gateway does not do any network address translation for instances without public IP addresses so an instance without a public IP address cannot connect outward to the Internet -- to do things like downloading software updates, or accessing other AWS resources like S31 and SQS -- if the default route on its VPC subnet is the Internet Gateway object. So, if you are an instance on a "public" subnet, then you need a public IP address in order to do a significant number of things that servers commonly need to do.

For instances with only a private IP address, there's an alternate way of outbound access to the Internet. This is where Network Address Translation² and a NAT instance come in.

The machines on a private subnet can access the Internet because the default route on a private subnet is not the VPC "Internet Gateway" object -- it is an EC2 instance configured as a NAT instance.

A NAT instance is an instance on a public subnet with a public IP, and specific configuration. There are AMIs that are pre-built to do this, or your can build your own.

When the private-addressed machines send traffic outward, the traffic is sent, by VPC, to the NAT instance, which replaces the source IP address on the packet (the private machine's private IP address) with its own public IP address, sends the traffic out to the Internet, accepts the response packets, and forwards them back to the private address of the originating machine. (It may also rewrite the source port, and in any case, it remembers the mappings so it knows which internal machine should receive the response packets). A NAT instance does not allow any "unexpected" inbound traffic to reach the private instances, unless it's been specifically configured to do so.

Thus, when accessing external Internet resource from a private subnet, the traffic traverses the NAT instance, and appears to the destination to have originated from the public IP address of the NAT instance... so the response traffic comes back to the NAT instance. Neither the security group assigned to the NAT instance nor the security group assigned to the private instance need to be configured to "allow" this response traffic, because security groups are stateful. They realize the response traffic is correlated to sessions originated internally, so it is automatically allowed. Unexpected traffic is, of course, denied unless the security group is configured to permit it.

Unlike conventional IP routing, where your default gateway is on your same subnet, the way it works in VPC is different: the NAT instance for any given private subnet is always on a different subnet, and that other subnet is always a public subnet, because the NAT instance needs to have a public external IP, and its default gateway has to be the VPC "Internet Gateway" object.

Similarly... you cannot deploy an instance with a public IP on a private subnet. It doesn't work, because the default route on a private subnet is (by definition) a NAT instance (which performs NAT on the traffic), and not the Internet Gateway object (which doesn't). Inbound traffic from the Internet would hit the public IP of the instance, but the replies would try to route outward through the NAT instance, which would either drop the traffic (since it would be composed of replies to connections it's not aware of, so they'd be deemed invalid) or would rewrite the reply traffic to use its own public IP address, which wouldn't work since the external origin would not accept replies that came from an IP address other than the one they were trying to initiate communications with.

In essence, then, the "private" and "public" designations are not really about accessibility or inaccessibility from the Internet. They are about the kinds of addresses that will be assigned to the instances on that subnet, which is relevant because of the need to translate -- or avoid translating -- those IP addresses for Internet interactions.

Since VPC has implicit routes from all VPC subnets to all other VPC subnets, the default route does not play a role in internal VPC traffic. Instances with private IP addresses will connect to other private IP addresses in the VPC "from" their private IP address, not "from" their public IP address (if they have one)... as long as the destination address is another private address within the VPC.

If your instances with private IP addresses never, under any circumstances, need to originate outbound Internet traffic, then they technically could be deployed on a "public" subnet and would still still be inaccessible from the Internet... but under such a configuration, it is impossible for them to originate outbound traffic towards the Internet, which includes connections with other AWS infrastructure services, again, like S31 or SQS.


1. Regarding S3, specifically, to say that Internet access is always required is an oversimplification that will likely grow in scope over time and spread to other AWS services, as the capabilities of VPC continue to grow and evolve. There is a relatively new concept called a VPC Endpoint that allows your instances, including those with only private IP addresses, to directly access S3 from selected subnets within the VPC, without touching "the Internet," and without using a NAT instance or NAT gateway, but this does require additional configuration, and is only usable to access buckets within the same AWS region as your VPC. By default, S3 -- which is, as of this writing, the only service that has exposed the capability of creating VPC endpoints -- is only accessible from inside VPC via the Internet. When you create a VPC endpoint, this creates a prefix list (pl-xxxxxxxx) that you can use in your VPC route tables to send traffic bound for that particular AWS service direct to the service via the virtual "VPC Endpoint" object. It also solves a problem of restricting outbound access to S3 for particular instance, because the prefix list can be used in outbound security groups, in place of a destination IP address or block -- and an S3 VPC endpoint can be subject to additional policy statements, restricting bucket access from inside, as desired.

2. As noted in the documentation, what's actually being discussed here is port as well as network address translation. It's common, though technically a bit imprecise, to refer to the combined operation as "NAT." This is somewhat akin to the way many of us tend to say "SSL" when we actually mean "TLS." We know what we're talking about, but we don't use the most correct word to describe it. "Note We use the term NAT in this documentation to follow common IT practice, though the actual role of a NAT device is both address translation and port address translation (PAT)."

29
votes

I'd suggest a different tack - ditch "private" subnets and NAT instances / gateways. They aren't necessary. If you don't want the machine to be accessible from the internet, don't put it in a security group that allows such access.

By ditching the NAT instance / gateway, you are eliminating the running cost of the instance / gateway, and you eliminate the speed limit (be it 250mbit or 10gbit).

If you have a machine that also does not need to access the internet directly, (and I would ask how you are patching it*), then by all means, don't assign a public IP address.

*If the answer here is some kind of proxy, well, you're incurring an overhead, but each to his own.

24
votes

I don't have the reputation to add a comment to Michael's answer above, hence adding my comment as an answer.

It is worth noting that the AWS managed gateway is ~3 times more expensive as on date when compared to running your own instance. This is of course assuming that you only require one NAT instance (i.e you don't have multiple NAT instances configured for failover, etc.) which is generally true for most small to medium use case scenarios. Assuming a monthly data transfer of 100GB via the NAT gateway,

Managed NAT instance monthly cost = $33.48/month ($0.045/hour * 744 hours in a month) + $4.50 ($0.045 per GB data processed * 100GB) + $10 ($.10/GB standard AWS data transfer charges for all data transferred via the NAT gateway) = $47.98

t2.nano instance configured as a NAT instance = $4.84/month ($0.0065 * 744 hours in a month) + $10 ($.10/GB standard AWS data transfer charges for all data transferred via the NAT instance) = $14.84

This of course changes when you go for redundant NAT instances since the AWS managed NAT gateway has built-in redundancy for high availability. If you don't care about the extra $33/month, then the managed NAT instance is definitely worth the reduced headache of not having to maintain another instance. If you are running a VPN (e.g. OpenVPN) instance for access to your instances within the VPC, you could simply configure that instance to also act as your NAT gateway, and then you don't have to maintain an extra instance just for NAT (although some people may frown upon the idea of combining VPN and NAT).

16
votes

The answer by Michael - sqlbot makes the implicit assumption that private IP addresses are required. I think it is worthwhile to question that assumption -- do we even need to use private IP addresses in the first place? At least one commenter asked the same question.

What is the advantage of a server on a private subnet with a NAT instance [vs.] a server [in a] public subnet with a strict security policy? – abhillman Jun 24 '14 at 23:45

Imagine a scenario where you're using a VPC and you assign public IP addresses to all of your EC2 instances. Don't worry, that doesn't mean they're necessarily reachable over the internet, because you use security groups to restrict access in exactly the same way that things worked with EC2 classic. By using public IP addresses you have the benefit of being able to easily expose certain services to a limited audience without needing to use something like an ELB. This frees you from the need to set up a NAT instance or NAT gateway. And since you need half as many subnets, you could choose to use a smaller CIDR allocation for your VPC or you could make bigger subnets with the same size VPC. And fewer subnets means you'll be paying less for inter-AZ traffic as well.

So, why don't we do this? Why does AWS say the best practice is to use private IPs?

Amazon Web Services has a limited supply of public IPv4 addresses because the internet as a whole has a limited supply of public IPv4 addresses. It is in their best interest for you to use private IP addresses which are effectively unlimited, rather than excessively consuming scarce public IPv4 addresses. You can see some evidence of this in how AWS prices Elastic IP's; an EIP attached to an instance is free, but an unused EIP costs money.

But for the sake of argument, let's assume that we don't care about the shortage of public IPv4 addresses on the internet. After all, my application is special. What happens next?

There are only two ways to attach a public IP address to an EC2 instance in a VPC.

1. Associate Public IP Address

You can request a public IP address when launching a new EC2 instance. This option appears as a checkbox in the console, as the --associate-public-ip-address flag when using aws-cli, and as the AssociatePublicIpAddress flag on an embedded network interface object when using CloudFormation. In any case, the public IP address is assigned to eth0 (DeviceIndex=0). You can only use this approach when launching a new instance. However, this comes with some drawbacks.

A disadvantage is that changing the security group of an instance that is using an embedded network interface object will force immediate replacement of the instance, at least if you're using CloudFormation.

Another disadvantage is that a public IP address assigned in this way will be lost when the instance is stopped.

2. Elastic IP

In general Elastic IP's are the preferred approach because they are safer. You are guaranteed to continue using the same IP address, you don't risk accidentally deleting any EC2 instances, you can freely attach/detach an Elastic IP at any time, and you have the freedom to change security groups applied to your EC2 instances.

... But AWS limits you to 5 EIP's per region. You can request more, and your request might be granted. But AWS could just as likely deny that request based on the reasoning I mentioned above. So you probably don't want to rely on EIP's if you plan on ever scaling your infrastructure beyond 5 EC2 instances per region.

In conclusion, using public IP addresses does come with some nice benefits, but you'll run into administrative or scaling problems if you try to use public IP addresses exclusively. Hopefully this helps to illustrate and explain why the best practices are the way they are.