This question is easy to explain with an example. I'm running a myscript.py
dummy Python-MPI script whose content is just following two lines.
from subprocess import call
call(['which', 'python'])
By default the Python executable visible to MPI-cluster nodes is /usr/bin/python
. I have another Python version installed in the home directory which can be activated by running source myhome/python35tf/bin/activate
.
Now I login to let's say the master node in the cluster (N-0) and run following two commands in shell.
source myhome/python35tf/bin/activate
srun -N 4 python myscript.py
This produces following output.
/path-to-users/myhome/python35tf/bin/python
/path-to-users/myhome/python35tf/bin/python
/path-to-users/myhome/python35tf/bin/python
/path-to-users/myhome/python35tf/bin/python
It makes sense at least one of the outputs points to the python35tf
Python executable since I activated it in N-0 node. But how comes all other nodes are also seeing the same Python executable in their environments? Aren't they supposed to print /usr/bin/python
? How does srun
ensure execution environments of all nodes are in sync with that of N-0?
[UPDATE] A related question here: How does OpenMPI Secure SHell into all the compute nodes from the master node?
srun env | grep ^PATH
. isPATH
exported ? if yes, that explains whyusr/bin/python
is not used - Gilles Gouaillardetsrun
command internally ssh to each worker node and sets PATH to match with the master node before executing the workloads? To confirm this theory I tried runningsrun -N 2 printenv
and observed that some of the environment variables are synced, but not all. Is there any reference page of OpenMPI describing how this exactly works? (To see what is copied and what's left out) - Battasrun
is aSLURM
command and not anOpen MPI
command. Internally,srun
is notSSH
-based, but it propagates some environment variables beforcefork&exec
'ing the binary. FWIW, in my environment,PATH
is propagated bysrun
. - Gilles Gouaillardet