0
votes

I'm using dask to distribute a large number of tasks.

All the tasks are independents and consist into running an external application.

Depending on the server used and the input arguments the time to process a task may differ.

At a given point some workers have no more tasks to process and wait for other workers to process the remaining tasks. See the bokeh screenshot below:

enter image description here

The documentation talks about 'work stealing' but it seems it doesn't apply here. Can I force distributed to redistribute the tasks amongst workers ?

1
What is the output of import distributed; distributed.__version__? - MRocklin
I'm using distributed 1.15.1 & dask 0.13.0 - Bertrand
Would you mind upgrading and seeing if this behavior changes? - MRocklin
I don't have to possibility to upgrade for now. As a workaround I tried to kill the remaining workers hopping that it would redistribute the tasks amongst the others. It redistributed the pending tasks but it seems that it also discarded all the results already computed by the killed workers. - Bertrand

1 Answers

0
votes

I suspect that this has been resolved in recent versions. I recommend upgrading.