Configuration Dask Distributed

Question

I'm setting up an environment for our data scientists to work on. Currently we have a single node running Jupyterhub with Anaconda and Dask installed. (2 sockets with 6 cores and 2 threads per core with 140 gb ram). When users create a LocalCluster, currently the default settings are to take all the available cores and memory (as far as I can tell). This is okay when done explicitly, but I want the standard LocalCluster to use less than this. Because almost everything we do is

Now when looking into the config I see no configuration dealing with n_workers, n_threads_per_worker, n_cores etc. For memory, in dask.config.get('distributed.worker') I see two memory related options (memory and memory-limit) both specifying the behaviour listed here: https://distributed.dask.org/en/latest/worker.html.

I've also looked at the jupyterlab dask extension, which lets me do all this. However, I can't force people to use jupyterlab.

TL;DR I want to be able set the following standard configuration when creating a cluster:

n_workers
processes = False (I think?)
threads_per_worker
memory_limit either per worker, or for the cluster. I know this can only be a soft limit.

Any suggestions for configuration is also very welcome.

MRocklin MRocklin · Accepted Answer · 2019-09-21T00:47:34

As of 2019-09-20 this isn't implemented. I recommend raising an feature request at https://github.com/dask/distributed/issues/new , or even a pull request.

Configuration Dask Distributed

1 Answers