I'm using MPI+CUDA mixed mode to program a GPU cluster for matrix multiplication. When I offload the multiplication operations to the GPUs via MPI and CUDA, it gives an error message at run time:
FATAL: Error inserting nvidia (/lib/modules/3.2.0-23-generic-pae/kernel/drivers/video/nvidia.ko): No such device
MPI is used to transfer the data blocks and then upon receiving the data, a generic C function is called that triggers a CUDA kernel. Test setup has 3 machines, each has single GPU. I tested with a CUDA only local version version. I didn't get any error messages, but the answers of the algorithms were wrong (Even for the small simple algorithms).
What's the reason for this error? Please note that this is only when I try to use the MPI with CUDA. CUDA only version works well. Thanks in advance.