I have launched an AWS EMR cluster following the steps on the EMR page. After connecting through SSH (putty in Windows 7) and enabling foxyproxy (Chrome), it launched fine and can be accessed in my laptop browser. Pyspark and sparkR come with the EMR Spark 1.6.0 installation and work perfectly in the terminal. The ports for Hue etc. work fine in the following format:
ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:/
I installed Jupyter by following the steps on http://jupyter.readthedocs.org/en/latest/install.html#using-pip
sudo pip install jupyter
I opened a new notebook with
jupyter notebook
It opened a browser in the terminal that I shut down. It gave the following output:
[I 14:32:12.001 NotebookApp] Writing notebook server cookie secret to /home/hadoop/.local/share/jupyter/runtime/notebook_cookie_secret
[I 14:32:12.033 NotebookApp] The port 8888 is already in use, trying another random port.
[I 14:32:12.037 NotebookApp] Serving notebooks from local directory: /home/hadoop
[I 14:32:12.037 NotebookApp] 0 active kernels
[I 14:32:12.038 NotebookApp] The Jupyter Notebook is running at: http://localhost:8889/
[I 14:32:12.038 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
I tried accessing it in my browser by: localhost:8889/
(didn't work of course)
then by: ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8889/
(replacing the x's), but this gave an error as well.
This webpage is not available
ERR_CONNECTION_RESET
So how can I access Jupyter in my local browser when it has been installed on the head node of an EMR cluster?