OpenCV Cuda “invalid device function” on first cuda call

Question

I've been building OpenCV with gpu support successfully for a while now, however, I've come across a situation that I can't seem to fix. After building OpenCV 3.3 with VS 2013 and CUDA 8.0, the OpenCV cpu and gpu seems to work fine on a couple of my test machines GTX 750 Ti and a GTX 950M (both with Windows 10). On another machine with a GTX 1050 Ti, the cpu calls work, but I get a "invalid device function" on my first OpenCV-cuda function call. In CMake, I've fiddled with the CUDA_ARCH_BIN and CUDA_GENERATION variables and rebuilt, but I can't seem to find a solution for this one machine. I've updated the NVidia graphics driver, tried CUDA_ARCH_BIN at 3.0,3.5,3.7,5.0, and CUDA_GENERATION at Kepler, Maxwell, and empty. All work on two of the test machines, and fail with the same error on the third. Everything I've found on the web says that this is caused by a mismatch between the GPU's compute capability and the CUDA_ARCH_BIN setting. I would think that if I set for 5.0/Maxwell, that it would run on Maxwell, Pascals, and newer. The only other variable is that the 1050 Ti is running on a Windows 7 box, and I'm praying that is not the problem. Or maybe there's an incompatibility between VS2013, Cuda 8.0, and/or OpenCV 3.3? Any ideas would be greatly appreciated.

@RobertCrovella it occurred to me as I was writing this question that this might be the problem. I would think that setting CUDA_ARCH_BIN to 3.0, 3.2, 3.5, 3.7, 5.0, 5.2 would work on a 6.1 card. I guess maybe every compute capability you want to cover has to be in the list? Anyway...building now and will report back. — Bryan Greenway
It depends on exactly how cmake converts those entries to actual CUDA build switches. If it specifies the inclusion of PTX, you are correct. If it does not, I am correct. Since the "invalid device function" error is a pretty conclusive indication that no suitable PTX exists in the built image, I'm inclined to believe that I am correct, and consistent with your own statement: "Everything I've found on the web says that this is caused by a mismatch between the GPU's compute capability and the CUDA_ARCH_BIN setting" — Robert Crovella

Bryan Greenway Bryan Greenway · Accepted Answer · 2017-10-20T16:12:11

Thanks to @RobertCrovella for providing the correct answer. The problem was solved by simply adding 6.1 to the CUDA_ARCH_BIN list in CMAKE. So what I ended up using was CUDA_ARCH_BIN = 5.0, 5.2, 6.0, 6.1 (since I'm only interested in Maxwell and Pascal) and I left CUDA_GENERATION empty. If you select something for CUDA_GENERATION, it automatically fills in CUDA_ARCH_BIN for you...and for me, it gave me more than I wanted.

Side note: I noticed that the more architectures you add to CUDA_ARCH_BIN, the larger the OpenCV dlls became. Which supports exactly what Robert was saying in his comments. It appears that for every architecture in the list, specific code for that architecture is added to the dll. If you don't put an arch in the list, the code will not run on that arch.

It all seems so obvious now.

Thanks again, Robert!

For those interested, here are my CUDA CMAKE settings:

OpenCV Cuda “invalid device function” on first cuda call

1 Answers