3
votes

I just set up a Spark cluster in Google Cloud using DataProc and I am trying to submit a simple pyspark hello-world.py job from my local machine using gcutil as specified in the documentation - https://cloud.google.com/dataproc/submit-job

gcloud beta dataproc jobs submit pyspark --cluster cluster-1 hello-world.py

However, I am getting the following error:

15/12/28 08:54:53 WARN org.spark-project.jetty.util.component.AbstractLifeCycle: FAILED   [email protected]:4040: java.net.BindException: Address  already in use
java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:433)
    at sun.nio.ch.Net.bind(Net.java:425)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
   at org.spark-project.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
...
  py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
   at py4j.GatewayConnection.run(GatewayConnection.java:207)
   at java.lang.Thread.run(Thread.java:745)

I have only submitted this job once, and so I'm puzzled as to why I'm getting this error. Any help would be appreciated.

1
I think it's just warning log. Spark will try for another port automatically and you need not to worry about it.Anil

1 Answers

3
votes

When a spark context is created, it starts an application UI port at 4040 by default. When the UI starts, it checks to see if it is in use, if so it should increment to 4041. Looks like you have something running on port 4040 there. The application should show you the warning, then try to start the UI on 4041.