1
votes

I have been using Microsoft Visual Studio as my IDE for Python and recently started using Dask to process large csv file. While attempting to utilize Dask distributed I receive numerous errors if I try to launch the dashboard.

I have tried simple code in both MS VS2017 and Jupyter notebook on multiple machines. I do not receive errors in Jupyter and the dashboard loads properly. However, the code crashes and no dashboard loads under Visual Studio.

Both IDEs are running under the same environment I am using the latest version of Dask and Python 3.6

An example of some simple code:

from dask import dataframe as ddf
from dask import multiprocessing 
from dask.distributed import Client
client = Client()

The above code will launch the dask dashboard on the localhost when running in under Jupyter. However it produces a ton of errors with VS2017. Below are some of the errors

distributed.nanny - WARNING - Worker process 13692 exited with status 1
The thread 0x8 has exited with code 0 (0x0).

The thread 0x4 has exited with code 0 (0x0).
The thread 0x9 has exited with code 0 (0x0).
The thread 0xb has exited with code 0 (0x0).
The thread 0xa has exited with code 0 (0x0).
distributed.nanny - WARNING - Worker process 15368 exited with status 1

The thread 0x5 has exited with code 0 (0x0).
distributed.nanny - WARNING - Worker process 16616 exited with status 1

The thread 0x6 has exited with code 0 (0x0).
distributed.nanny - WARNING - Worker process 22288 exited with status 1

The thread 0x7 has exited with code 0 (0x0).
distributed.nanny - WARNING - Restarting worker

Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\multiprocessing\queues.py", line 236, in _feed
    send_bytes(obj)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\multiprocessing\connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\multiprocessing\connection.py", line 280, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed
The thread 0x10 has exited with code 0 (0x0).
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 883, in callback
    result_list.append(f.result())
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\deploy\local.py", line 316, in _start_worker
    raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start

tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 883, in callback
    result_list.append(f.result())
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\deploy\local.py", line 316, in _start_worker
    raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start

tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 883, in callback
    result_list.append(f.result())
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\deploy\local.py", line 316, in _start_worker
    raise gen.TimeoutError("Worker failed to start")
tornado.util.TimeoutError: Worker failed to start

distributed.nanny - ERROR - Failed to restart worker after its process exited
Traceback (most recent call last):
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\nanny.py", line 343, in _on_exit
    yield self.instantiate()
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1133, in run
    value = future.result()
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\distributed\nanny.py", line 276, in instantiate
    timedelta(seconds=self.death_timeout), self.process.start()
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1133, in run
    value = future.result()
  File "C:\Users\C\Anaconda3\envs\envTensorflow\lib\site-packages\tornado\gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  Fil...

Worker failed to start
Stack trace:
 >  File "C:\Users\C\Anaconda3\envs\envTensorflow\Lib\site- 
 packages\distributed\deploy\local.py", line 316, in _start_worker
 >    raise gen.TimeoutError("Worker failed to start")
1

1 Answers

0
votes

From the errors it looks like Visual Studio doesn't like running interactive code that uses multiprocessing in the way that Dask uses multiprocessing.

The simplest solution would be to start your client without Processes

client = Client(processes=False)

Although this has some performance implications, particularly when working with non-numeric data.