This question is easy to explain with an example. I'm running a myscript.py dummy Python-MPI script whose content is just following two lines.
from subprocess import call
call(['which', 'python'])
By default the Python executable visible to MPI-cluster nodes is /usr/bin/python. I have another Python version installed in the home directory which can be activated by running source myhome/python35tf/bin/activate.
Now I login to let's say the master node in the cluster (N-0) and run following two commands in shell.
source myhome/python35tf/bin/activate
srun -N 4 python myscript.py
This produces following output.
/path-to-users/myhome/python35tf/bin/python
/path-to-users/myhome/python35tf/bin/python
/path-to-users/myhome/python35tf/bin/python
/path-to-users/myhome/python35tf/bin/python
It makes sense at least one of the outputs points to the python35tf Python executable since I activated it in N-0 node. But how comes all other nodes are also seeing the same Python executable in their environments? Aren't they supposed to print /usr/bin/python? How does srun ensure execution environments of all nodes are in sync with that of N-0?
[UPDATE] A related question here: How does OpenMPI Secure SHell into all the compute nodes from the master node?
srun env | grep ^PATH. isPATHexported ? if yes, that explains whyusr/bin/pythonis not used - Gilles Gouaillardetsruncommand internally ssh to each worker node and sets PATH to match with the master node before executing the workloads? To confirm this theory I tried runningsrun -N 2 printenvand observed that some of the environment variables are synced, but not all. Is there any reference page of OpenMPI describing how this exactly works? (To see what is copied and what's left out) - Battasrunis aSLURMcommand and not anOpen MPIcommand. Internally,srunis notSSH-based, but it propagates some environment variables beforcefork&exec'ing the binary. FWIW, in my environment,PATHis propagated bysrun. - Gilles Gouaillardet