I am working in a highly regulated and secure Financial organization. We are using AWS EMR 5.20. We use transient/persistent EMR clusters. Due to Security/Architecture constraints, we can't keep the entire port range open as Architecture review board objects. We generally use single master and core nodes(no task node). We have one security group for master and one security group for slave(core). AWS is not providing us the specific ports that we should keep open. Master and slave nodes needs to communicate with each other so we have kept "All TCP" ports(0-65K), UDP ports(0-65K), All ICMP(0-65K) plus port 8443(TCP)- which is for master node communication with cluster manager. Each entry in the security group for master and slave contains source for master/slave so that master and slave can communicate. I tried opening some more ports for master as per this
It's all hit and trial for us as we see the open source hadoop ports, cloudera hadoop ports etc..and try out provisioning EMR cluster. But we are failing. When EMR master node doesn't comes up, then we can't even use the VPC flow logs to see which traffic is being rejected. Is somebody else also using this kind of configuration? How you are able to achieve these kind of constraints? Our EMR cluster runs in private VPC.