cuda-memcheck is reporting this information for a release mode CUDA kernel:
========= Error: process didn't terminate successfully
========= Invalid __global__ read of size 4
========= at 0x000002c8 in xx_kernel
========= by thread (0,0,0) in block (0,0)
========= Address 0x10101600014 is out of bounds
=========
========= ERROR SUMMARY: 1 error
This fault only happens in release mode. It also doesn't happen when running under cuda-gdb.
How can I take the 0x000002c8 address and determine the code that is causing the fault? I've looked through the cached intermediate files (.ptx, .cubin, etc) and see no obvious way to determine the faulty source code.
This is on x86_64 Linux with CUDA 3.2.
UPDATE: Turns out it was a compiler bug in 3.2. Upgrading to 4.0 makes the memcheck error go away. Also, I was able to disassemble the CUBIN with the cuobjdump from 4.0, but since it was release mode and optimized, it was exceedingly difficult to match the disassembly to the source code.