1
votes

I am submitting a Spark job on EMR cluster and I want to see the Spark Web UI which gives the information about the configuration and status of the master node and also worker node.

Configuration Details:
Release Label : emr-5.17.0
Applications : SPARK 2.3.1

After starting up the cluster, the only clickable link is "Enable Web Connection" under the Connections in Summary page of the cluster.

Option 1: I tried with the steps mentioned in "Enable Web Connection" but it didn't work out.

Option 2: I tried with Setting up an SSH tunnel to Master Node using Local Port Forwarding on Linux https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ssh-tunnel-local.html. I was still not able to open up Sark UI or Resource Manager web interface.

Option 3: I tried with Option 2 + Configuring Foxy Proxy for Firefox (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node-proxy.html) and still trying to open web interfaces by typing master-public-dns followed by port number or URL (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-web-interfaces.html)

Can someone please tell me step by step process, how can I properly enable the web interface and see monitor the Spark applications?

PS: I am using Linux (Ubuntu) and Web Browser as Firefox

2

2 Answers

0
votes

No need to do any such operation, just try to get master node URI. By default Spark UI port number is 4040.

You can also get this info from spark configuration file.

TO open the spark UI : http://driver-node:4040 above will work and it also depends on the permission so if you have then you can see UI.

0
votes

If you are doing this at your work, I am assuming that your work has port restrictions to your AWS VPC. But for the SSH tunnel to work you need to atleast open port 22 thru firewall. You can check that by connecting to EMR master node via SSH. If you can do that then port 22 is open. Then you can follow the option 2 both part 1 and part 2 and you should be able to connect.

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ssh-tunnel.html?shortFooter=true

also sometimes the master DNS name shown on the EMR page may not work. Instead use the real ip address xx.xx.xx.xx:4040 or 8088 etc.