I'm using the pre-built AI Platform Jupyter Notebook instances to train a model with a single Tesla K80 card. The issue is that I don't believe the model is actually training on the GPU.
nvidia-smi
returns the following during training:
Not the "No Running Process Found" yet "Volatile GPU Usage" is 100%. Something seems strange...
...And the training is excruciatingly slow.
A few days ago, I was having issues with the GPU not being released after each notebook run. When this occurred I would receive a OOM (Out of memory error). This required me to go into the console every time, find the GPU running process PID and use kill -9 before re-running the notebook. However, today, I can't get the GPU to run at all? It never shows a running process.
I've tried 2 different GCP AI Platform Notebook instances (both of the available tensorflow version options) with no luck. Am I missing something with these "pre-built" instances.
Pre-Built AI Platform Notebook Section
Just to clarify, I did not build my own instance and then install access to Jupyter notebooks. Instead, I used the built-in Notebook instance option under the AI Platform submenu.
Do I still need to configure a setting somewhere or install a library to continue using/reset my chosen GPU? I was under the impression that the virtual machine was already loaded with the Nvidia stack and should be plug and play with GPUs.
Thoughts?
EDIT: Here is a full video of the issue as requested --> https://www.youtube.com/watch?v=N5Zx_ZrrtKE&feature=youtu.be
pip uninstall tensorflow
+pip3 uninstall tensorflow
and thenpip3 install tensorflow-gpu
? – razimbres