0
votes

I am trying to use DASK for fast computing as logistic regression aborted after 17 hours on my system. My data set is about 1 million rows.

I first ran these commands:

import dask.array as da
import dask.dataframe as dd
from dask.distributed import Client 
client = Client() 
from dask.distributed import Client 
client = Client()

The above commands ran but through a warning:

C:\ProgramData\Anaconda3\lib\site-packages\distributed\bokeh\core.py:57: UserWarning: Port 8787 is already in use. Perhaps you already have a cluster running? Hosting the diagnostics dashboard on a random port instead. warnings.warn('\n' + msg)

Then I ran these commands:

import dask_ml.joblib
from sklearn.externals import joblib

Error: AttributeError: module 'dask.array' has no attribute 'blockwise'

Can anyone help me with how to resolve this?

1

1 Answers

2
votes

You should not be setting up two local clusters, which is what calling Client() twice will do for you - that is why you see the warning and a port being unavailable.

Error: AttributeError: module 'dask.array' has no attribute 'blockwise'

I can assure you that the module is indeed part of dask, so this suggests that maybe you do not have your environment set up correctly. Without further details on how you installed things and what versions you have installed, it is hard to say more. have you run client.get_versions()?