I am trying to set up an EMR cluster with JupyterHub and S3 persistence. I have the following classification:
{
"Classification": "jupyter-s3-conf",
"Properties": {
"s3.persistence.enabled": "true",
"s3.persistence.bucket": "my-persistence-bucket"
}
}
I am installing dask with the following step (otherwise, opening the notebook would result in a 500 error):
command-runner.jar- Arguments:
/usr/bin/sudo /usr/bin/docker exec jupyterhub conda install dask
However, when I then open a new notebook, it is not persisted. The bucket stays empty. The cluster DOES have access to S3, as when running a Spark job with the same configuration which reads from and writes to S3, it can do so, with the same bucket.
However, when looking into the jupyter log on my master, I see this:
[E 2019-08-07 12:27:14.609 SingleUserNotebookApp application:574] Exception while loading config file /etc/jupyter/jupyter_notebook_config.py
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/traitlets/config/application.py", line 562, in _load_config_files
config = loader.load_config()
File "/opt/conda/lib/python3.6/site-packages/traitlets/config/loader.py", line 457, in load_config
self._read_file_as_dict()
File "/opt/conda/lib/python3.6/site-packages/traitlets/config/loader.py", line 489, in _read_file_as_dict
py3compat.execfile(conf_filename, namespace)
File "/opt/conda/lib/python3.6/site-packages/ipython_genutils/py3compat.py", line 198, in execfile
exec(compiler(f.read(), fname, 'exec'), glob, loc)
File "/etc/jupyter/jupyter_notebook_config.py", line 5, in <module>
from s3contents import S3ContentsManager
File "/opt/conda/lib/python3.6/site-packages/s3contents/__init__.py", line 15, in <module>
from .gcsmanager import GCSContentsManager
File "/opt/conda/lib/python3.6/site-packages/s3contents/gcsmanager.py", line 8, in <module>
from s3contents.gcs_fs import GCSFS
File "/opt/conda/lib/python3.6/site-packages/s3contents/gcs_fs.py", line 3, in <module>
import gcsfs
File "/opt/conda/lib/python3.6/site-packages/gcsfs/__init__.py", line 4, in <module>
from .dask_link import register as register_dask
File "/opt/conda/lib/python3.6/site-packages/gcsfs/dask_link.py", line 56, in <module>
register()
File "/opt/conda/lib/python3.6/site-packages/gcsfs/dask_link.py", line 51, in register
dask.bytes.core._filesystems['gcs'] = DaskGCSFileSystem
AttributeError: module 'dask.bytes.core' has no attribute '_filesystems'
What am I missing and what is going wrong?