2
votes

i'm new to Docker, especially to Nvidia-Docker. I'm trying to wrap my code into docker container and run it on some hosts. But apparently something goes wrong and i'm not able to run my code in docker. I have installed Nvidia-docker and Dockerfile is taken from here. Here is my full docker code

FROM nvidia/cuda:9.1-runtime-ubuntu16.04
RUN apt-get update && apt-get install -y \
        cuda-command-line-tools-$CUDA_PKG_VERSION \
        cuda-libraries-dev-$CUDA_PKG_VERSION \
        cuda-minimal-build-$CUDA_PKG_VERSION \
&& \
    rm -rf /var/lib/apt/lists/*

ENV LIBRARY_PATH /usr/local/cuda/lib64/stubs

FROM python:3.7-slim
RUN pip install numpy
RUN apt update && \
    apt-get -y install gcc && \
    apt-get -y install g++
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ADD helmsolver /helmsolver
CMD dpkg -l | grep -i cuda
CMD cd helmsolver && bash tests.sh

And a bash script code where cudahelmf and cudahelmd are previously compiled by

nvcc helm3dcudafnd.cu -o cudahelm -I/usr/local/cuda/samples/common/inc/ -lcufft -lcufftw -D DOUBLE
#!/bin/sh
mkdir helmholtz
cd helmholtz
        mkdir build
        mkdir workdir
        mkdir src
        mkdir scripts
        ls
        cp ../cudahelmf ./build
        cp ../cudahelmd ./build
        cp ../tmp.py ./scripts/
        cd workdir
        python3 ../scripts/script1.py 21 21 1
        ../build/cudahelmd config.cfg >> results_double.txt
        ../build/cudahelmf config.cfg >> results_float.txt

To build and run docker i use

nvidia-docker build -t helm .
nvidia-docker run --rm -ti helm

And after running i have error

../build/cudahelmd: error while loading shared libraries: libcufft.so.9.1: cannot open shared object file: No such file or directory

What am i doing wrong? Does it happens because of the -lcufft compile option and docker doesn't know where to get it? And docker doesn't have /usr/local/cuda/ directory after installation. It seems strange due to cuda-libraries-dev include cufft library and installations ends successfully.

Here is nvcc version on my computer where code was compiled and previously tested.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

And nvidia-docker version

Docker version 19.03.3, build a872fc2f86

P.S. Maybe there is an option to compile code in docker?

1
@RobertCrovella It didn't help - aleks
are there any libcufft... files in your /usr/local/cuda/lib64 directory in the container? Which are they? - Robert Crovella
@RobertCrovella there is no such directory as i mentioned before - aleks
What happens if you just start with FROM nvidia/cuda:9.1-runtime-ubuntu16.04 and don't install the next 3 cuda packages you have in your dockerfile? When I do that I have a directory /usr/local/cuda/lib64 with the needed cufft libraries in it. - Robert Crovella
@RobertCrovella Nothing happens, no new directories, especially cuda. Can you show your full Dockerfile? - aleks

1 Answers

2
votes

the problem is you are running a multistage dockerfile without COPY from one to another , therefore you will be end only with the standalone python3 container which has nothing from nvidia container, so you need to copy the required files like this in python container:

COPY --from=0 SOURCE DEST