3
votes

I have launched an AWS EMR cluster following the steps on the EMR page. After connecting through SSH (putty in Windows 7) and enabling foxyproxy (Chrome), it launched fine and can be accessed in my laptop browser. Pyspark and sparkR come with the EMR Spark 1.6.0 installation and work perfectly in the terminal. The ports for Hue etc. work fine in the following format:

ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:/

I installed Jupyter by following the steps on http://jupyter.readthedocs.org/en/latest/install.html#using-pip

sudo pip install jupyter

I opened a new notebook with

jupyter notebook

It opened a browser in the terminal that I shut down. It gave the following output:

[I 14:32:12.001 NotebookApp] Writing notebook server cookie secret to /home/hadoop/.local/share/jupyter/runtime/notebook_cookie_secret
[I 14:32:12.033 NotebookApp] The port 8888 is already in use, trying another random port.
[I 14:32:12.037 NotebookApp] Serving notebooks from local directory: /home/hadoop
[I 14:32:12.037 NotebookApp] 0 active kernels
[I 14:32:12.038 NotebookApp] The Jupyter Notebook is running at: http://localhost:8889/
[I 14:32:12.038 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

I tried accessing it in my browser by: localhost:8889/

(didn't work of course)

then by: ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8889/

(replacing the x's), but this gave an error as well.

This webpage is not available

ERR_CONNECTION_RESET

So how can I access Jupyter in my local browser when it has been installed on the head node of an EMR cluster?

1

1 Answers

3
votes

I haven't actually used Jupyter yet, but I tried installing and running it like you did, and I noticed that Jupyter is configured by default to listen only on localhost, which is why you can't access it from your browser.

I then found that running "jupyter notebook --generate-config" would generate a config file ~/.jupyter/jupyter_notebook_config.py, which you can edit in order to make it listen on 0.0.0.0 instead of localhost. Just change c.NotebookApp.ip to '0.0.0.0' and uncomment the line.

After doing this, I was able to access Jupyter from my browser using a URL like http://ip-10-168-157-117.ec2.internal:8888/. (Mine is listening on port 8888 by default, but I'm assuming yours started on port 8889 due to having Hue installed and listening on port 8888 already.)