1
votes

This question is easy to explain with an example. I'm running a myscript.py dummy Python-MPI script whose content is just following two lines.

from subprocess import call
call(['which', 'python'])

By default the Python executable visible to MPI-cluster nodes is /usr/bin/python. I have another Python version installed in the home directory which can be activated by running source myhome/python35tf/bin/activate.

Now I login to let's say the master node in the cluster (N-0) and run following two commands in shell.

source myhome/python35tf/bin/activate
srun -N 4 python myscript.py

This produces following output.

/path-to-users/myhome/python35tf/bin/python
/path-to-users/myhome/python35tf/bin/python
/path-to-users/myhome/python35tf/bin/python
/path-to-users/myhome/python35tf/bin/python

It makes sense at least one of the outputs points to the python35tf Python executable since I activated it in N-0 node. But how comes all other nodes are also seeing the same Python executable in their environments? Aren't they supposed to print /usr/bin/python? How does srun ensure execution environments of all nodes are in sync with that of N-0?

[UPDATE] A related question here: How does OpenMPI Secure SHell into all the compute nodes from the master node?

1
try srun env | grep ^PATH. is PATH exported ? if yes, that explains why usr/bin/python is not used - Gilles Gouaillardet
@GillesGouaillardet so are you suggesting that srun command internally ssh to each worker node and sets PATH to match with the master node before executing the workloads? To confirm this theory I tried running srun -N 2 printenv and observed that some of the environment variables are synced, but not all. Is there any reference page of OpenMPI describing how this exactly works? (To see what is copied and what's left out) - Batta
First, srun is a SLURM command and not an Open MPI command. Internally, srun is not SSH-based, but it propagates some environment variables beforce fork&exec'ing the binary. FWIW, in my environment, PATH is propagated by srun. - Gilles Gouaillardet

1 Answers

1
votes

The srun command propagates all of the user's environment to the compute node by default. You can control which variables are to be exported with the --export parameter.