0
votes

I'm trying modin, but keep getting an error:

import modin.pandas as md
import pandas as pd

PATH = 'file.csv'

%%time
df = pd.read_csv(PATH)

%%time
mdf = md.read_csv(PATH)

error:

UserWarning: Dask execution environment not yet initialized. Initializing... To remove this warning, run the following python code before doing dataframe operations:

from distributed import Client

client = Client()

Task exception was never retrieved future: <Task finished name='Task-8' coro=<_wrap_awaitable() done, defined at C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\asyncio\tasks.py:683> exception=ImportError("cannot import name 'Popen' from partially initialized module 'multiprocessing.popen_spawn_win32' (most likely due to a circular import) (C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\multiprocessing\popen_spawn_win32.py)")> Traceback (most recent call last): File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\asyncio\tasks.py", line 690, in _wrap_awaitable return (yield from awaitable.await()) File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\site-packages\distributed\core.py", line 290, in _ await self.start() File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\site-packages\distributed\nanny.py", line 295, in start response = await self.instantiate() File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\site-packages\distributed\nanny.py", line 378, in instantiate result = await self.process.start() File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\site-packages\distributed\nanny.py", line 575, in start await self.process.start() File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\site-packages\distributed\process.py", line 34, in _call_and_set_future res = func(*args, **kwargs) File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\site-packages\distributed\process.py", line 202, in _start process.start() File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 326, in _Popen from .popen_spawn_win32 import Popen ImportError: cannot import name 'Popen' from partially initialized module 'multiprocessing.popen_spawn_win32' (most likely due to a circular import) (C:\Users\Oleg\AppData\Local\Programs\Python\Python39\lib\multiprocessing\popen_spawn_win32.py) '''

I have a popen version 0.1.20 if it is of any help. Someone on SO suggested to try the thing in error message - import dask.distributed and start the client, but it didn't help.

Any help is much appreciated.

ps. I wanted to try modin a few weeks ago but installation wasn't at all straightforward, with lots of errors, mostly with ray and dask imports. I managed to make dask to work somehow, not modin. And started to learn its api. Now I decided to give it another try since I figured dask was working fine, but no, still some import errors and whats not.

1

1 Answers

0
votes

It seems that modin is automatically creating a dask local cluster of processes. Unfortunately, each of those is importing your script, so that they can understand the defined variables - and each is also trying to then start a new dask local cluster.

You should try to put your code in a function, and calling that function from a block protected by

if __name__ == "__main__":