I am trying to train a Yolo computer vision model using a container I built which includes an installation of Darknet. The container is using the Nvidia supplied base image: nvcr.io/nvidia/cuda:9.0-devel-ubuntu16.04
Using Nvidia-Docker on my local machine with a gtx 1080 ti, training runs very fast, however that same container running as an Azure Container Instance with a P100 gpu trains very slowly. It's almost as if it's not utilizing the gpu. I also noticed that the "nvidia-smi" command does not work in the container running in Azure, but it does work when I ssh into the container running locally on my machine.
Here is the Dockerfile I am using
FROM nvcr.io/nvidia/cuda:9.0-devel-ubuntu16.04
LABEL maintainer="[email protected]" \
description="Pre-Configured Darknet Machine Learning Environment" \
version=1.0
# Container Dependency Setup
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install software-properties-common -y
RUN apt-get install vim -y
RUN apt-get install dos2unix -y
RUN apt-get install git -y
RUN apt-get install wget -y
RUN apt-get install python3-pip -y
RUN apt-get install libopencv-dev -y
# setup virtual environment
WORKDIR /
RUN pip3 install virtualenv
RUN virtualenv venv
WORKDIR venv
RUN mkdir notebooks
RUN mkdir data
RUN mkdir output
# Install Darknet
WORKDIR /venv
RUN git clone https://github.com/AlexeyAB/darknet
RUN sed -i 's/GPU=0/GPU=1/g' darknet/Makefile
RUN sed -i 's/OPENCV=0/OPENCV=1/g' darknet/Makefile
WORKDIR /venv/darknet
RUN make
# Install common pip packages
WORKDIR /venv
COPY requirements.txt ./
RUN . /venv/bin/activate && pip install -r requirements.txt
# Setup Environment
EXPOSE 8888
VOLUME ["/venv/notebooks", "/venv/data", "/venv/output"]
CMD . /venv/bin/activate && jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root
The requirements.txt file is as shown below:
jupyter
matplotlib
numpy
opencv-python
scipy
pandas
sklearn