Does distributed dask scheduler node need the same enviroment as the worker nodes?

Question

When setting a distributed dask cluster by using the basic CLI method (i.e. dask-scheduler, dask-worker), does the dask scheduler node also need the same environment (e.g. packages) as the worker nodes? I have a docker swarm with NVIDIA Jetson AGX (Arm64v8 + GPU) units as workers and an Intel x86-64 server as the scheduler (no GPU), so it is not easy (if not possible) to have the same environment. Requests will mainly come from Jupyter Notebooks served by the Jetson units. Hence, request and computation environments are the same (different containers, but same images), only the scheduler is different.

mdurant mdurant · Accepted Answer · 2020-11-15T22:49:37

The answer is loosely "yes". It is critical that the client and worker share the same (or compatible, which is almost the same thing) package versions of all packages that your code touches, since objects will be pickled in one environment and unpickled in the other.

For the scheduler, yes dask/distributed must match exactly, since some internal messaging logic is likely to have changed. But you should also try to make versions of other packages used in the communication the same too; it's hard to give an exhaustive list of these. A difference might cause failure. Currently, the client.get_versions method checks the versions of: python, dask, distributed, msgpack, cloudpickle, tornado, toolz, numpy, lz4, blosc. If these are all the samem you may have a good chance.

Does distributed dask scheduler node need the same enviroment as the worker nodes?

1 Answers