2
votes

I decided to try and use Google Cloud Datalab for a small project that I'm working on rather than a Jupyter Notebook in an Anaconda environment on an AWS instance.

How can I install a package (for example OpenCV) onto the Datalab VM so that I don't have to reinstall it every time I restart my VM? Why do the packages disappear after every restart but the updated notebooks remain persistent? Any help answering these questions and clarifying how the Datalab VM works would be very helpful.

1

1 Answers

2
votes

The notebooks are stored in a docker volume mount that represents a location on the persistent disk that is maintained across restarts of the VM.

The packages you install however are stored in the running container and hence lost on each restart.

You could create a custom docker image and use that instead. On the datalab create command, see the --image-name argument.

Here is an example of a Dockerfile you'll want to use:

FROM gcr.io/cloud-datalab/datalab:latest
RUN pip install opencv

Note that you'll need build the docker image using this docker file, and push the image to Google Container Registry. My memory is a bit fuzzy on this, but it is possible this image needs to be marked as public.

Hope that helps!