I've been building OpenCV with gpu support successfully for a while now, however, I've come across a situation that I can't seem to fix. After building OpenCV 3.3 with VS 2013 and CUDA 8.0, the OpenCV cpu and gpu seems to work fine on a couple of my test machines GTX 750 Ti and a GTX 950M (both with Windows 10). On another machine with a GTX 1050 Ti, the cpu calls work, but I get a "invalid device function" on my first OpenCV-cuda function call. In CMake, I've fiddled with the CUDA_ARCH_BIN and CUDA_GENERATION variables and rebuilt, but I can't seem to find a solution for this one machine. I've updated the NVidia graphics driver, tried CUDA_ARCH_BIN at 3.0,3.5,3.7,5.0, and CUDA_GENERATION at Kepler, Maxwell, and empty. All work on two of the test machines, and fail with the same error on the third. Everything I've found on the web says that this is caused by a mismatch between the GPU's compute capability and the CUDA_ARCH_BIN setting. I would think that if I set for 5.0/Maxwell, that it would run on Maxwell, Pascals, and newer. The only other variable is that the 1050 Ti is running on a Windows 7 box, and I'm praying that is not the problem. Or maybe there's an incompatibility between VS2013, Cuda 8.0, and/or OpenCV 3.3? Any ideas would be greatly appreciated.
1 Answers
Thanks to @RobertCrovella for providing the correct answer. The problem was solved by simply adding 6.1 to the CUDA_ARCH_BIN list in CMAKE. So what I ended up using was CUDA_ARCH_BIN = 5.0, 5.2, 6.0, 6.1 (since I'm only interested in Maxwell and Pascal) and I left CUDA_GENERATION empty. If you select something for CUDA_GENERATION, it automatically fills in CUDA_ARCH_BIN for you...and for me, it gave me more than I wanted.
Side note: I noticed that the more architectures you add to CUDA_ARCH_BIN, the larger the OpenCV dlls became. Which supports exactly what Robert was saying in his comments. It appears that for every architecture in the list, specific code for that architecture is added to the dll. If you don't put an arch in the list, the code will not run on that arch.
It all seems so obvious now.
Thanks again, Robert!
For those interested, here are my CUDA CMAKE settings:
