1
votes

I am working in a highly regulated and secure Financial organization. We are using AWS EMR 5.20. We use transient/persistent EMR clusters. Due to Security/Architecture constraints, we can't keep the entire port range open as Architecture review board objects. We generally use single master and core nodes(no task node). We have one security group for master and one security group for slave(core). AWS is not providing us the specific ports that we should keep open. Master and slave nodes needs to communicate with each other so we have kept "All TCP" ports(0-65K), UDP ports(0-65K), All ICMP(0-65K) plus port 8443(TCP)- which is for master node communication with cluster manager. Each entry in the security group for master and slave contains source for master/slave so that master and slave can communicate. I tried opening some more ports for master as per this

It's all hit and trial for us as we see the open source hadoop ports, cloudera hadoop ports etc..and try out provisioning EMR cluster. But we are failing. When EMR master node doesn't comes up, then we can't even use the VPC flow logs to see which traffic is being rejected. Is somebody else also using this kind of configuration? How you are able to achieve these kind of constraints? Our EMR cluster runs in private VPC.

1
The default configuration is all up from master to slave and slave to master but NOT for outside. It doesn't have any security risk as far as I know. Why do you need to close the ports for the communication between master and slave? Is it helpful for security?Lamanus
Our Security/Architecture group want to limit those ports even for master to slave communication within our private VPC.Ashu
HI @Ashu I'm facing the same problem, I need to reduce the port range. Did you manage to do it ?Max0u
Please see my answerAshu

1 Answers

0
votes

Finally I am able to do so. As the EMR cluster is running in private VPC - all these communications are allowed:

master to master;
slave to slave;
master to slave;
slave to master

for all the port ranges. We are also using endpoints - S3 endpoints, dynamoDB endpoints, logsendpoints.

Most troublesome entries were 0.0.0.0 in the outbound security group which was open for port 22(SSH)-so within the private VPC - somebody can SSH into the port. We took away that also.

AWS Documentation is outdated and not very clear. It has some wordings which says that if you are using managed security groups - EMR can add entries at runtime to make it work. Pay special attention to that. We don't use managed security groups.