1
votes

Pyspark shell initiates a Java gateway using Py4J then talk to it and send the python SparkContext to Java gateway.

However, How can I know which port Spark Context open? How could PySpark decide which port to use to create Java gateway Spark Context?

Additional question:

  1. Who start Py4j java process?
2

2 Answers

2
votes

Maybe PySpark is using the default ports, see Py4J docs for details https://www.py4j.org/faq.html#what-ports-are-used-by-py4j.

0
votes

The port is chosen randomly from the available ports in the driver. pyspark launches the spark java process with a name of a temporary file as a parameter, the java process writes the port and auth_token to the temporary file. Python reads the temporary file and creates a py4j gateway. you can access the py4j gateway in sc._gateway and read the port from sc._gateway.gateway_parameters.port.