2
votes

I have this cuda code, that when I execute with cuda-memcheck it returns no errors, it exits normally, and the results I get are actually the expected... At the same time, there is a file "cuda-memcheck-(put various mumbers here).out" created, that is empty.

When I run the same program under cuda-gdb, it also exits normally with no error reports.

But when I do "set cuda memcheck on" (under cuda-gdb) and then run the program, then a file "cuda-memcheck.out" is created that says:

Starting cuda-memcheck...

cuda-memcheck encountered an error (3,2,2)

that happens as soon as I execute 'run' from within cuda-gdb. Then, soon after the execution starts (and actually very close to a CUFFT kernel execution) i get the following:

Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 81, grid 82, block (0,0,0), thread (1,17,0), device 0, sm 0, warp 2, lane 5]
*** glibc detected *** cuda-gdb: double free or corruption (!prev): 0x000000001070e5d0 ***
======= Backtrace: =========
/lib/libc.so.6(+0x774b6)[0x7f597c4814b6]
/lib/libc.so.6(cfree+0x73)[0x7f597c487c83]
/lib/libc.so.6(obstack_free+0x48)[0x7f597c48b128]
cuda-gdb[0x5c5a25]
cuda-gdb[0x4352d0]
cuda-gdb[0x5649c4]
cuda-gdb[0x5e42d7]
cuda-gdb[0x5e45fd]
cuda-gdb[0x56436c]
cuda-gdb[0x50d772]
cuda-gdb[0x50e3c3]
cuda-gdb[0x51373b]
cuda-gdb[0x50faff]
cuda-gdb[0x510194]
cuda-gdb[0x51373b]
cuda-gdb[0x50eaac]
cuda-gdb[0x504311]
cuda-gdb[0x50a09d]
cuda-gdb[0x4ffc4f]
cuda-gdb[0x410b9d]
cuda-gdb[0x519c48]
cuda-gdb[0x51a82c]
cuda-gdb[0x5f9b57]
cuda-gdb[0x519cb9]
cuda-gdb[0x5186c8]
cuda-gdb[0x51995a]
cuda-gdb[0x51373b]
cuda-gdb[0x4a1a40]
cuda-gdb[0x407529]
cuda-gdb[0x51373b]
cuda-gdb[0x408056]
cuda-gdb[0x51373b]
cuda-gdb[0x407464]
cuda-gdb[0x40742e]
/lib/libc.so.6(__libc_start_main+0xfe)[0x7f597c428d8e]
cuda-gdb[0x407339]
======= Memory map: ========
00400000-008b6000 r-xp 00000000 08:01 190634                             /usr/local/cuda/bin/cuda-gdb
00ab5000-00ab6000 r--p 004b5000 08:01 190634                             /usr/local/cuda/bin/cuda-gdb
00ab6000-00ac9000 rw-p 004b6000 08:01 190634                             /usr/local/cuda/bin/cuda-gdb
00ac9000-0f0f8000 rw-p 00000000 00:00 0 
0fef3000-11b2c000 rw-p 00000000 00:00 0                                  [heap]
7f5974000000-7f5974021000 rw-p 00000000 00:00 0 
7f5974021000-7f5978000000 ---p 00000000 00:00 0 
7f597a4df000-7f597aacc000 rw-p 00000000 00:00 0 
7f597aacc000-7f597aacd000 rw-p 00000000 00:00 0 
7f597b538000-7f597b54d000 r-xp 00000000 08:01 954799                     /lib/libgcc_s.so.1
7f597b54d000-7f597b74c000 ---p 00015000 08:01 954799                     /lib/libgcc_s.so.1
7f597b74c000-7f597b74d000 r--p 00014000 08:01 954799                     /lib/libgcc_s.so.1
7f597b74d000-7f597b74e000 rw-p 00015000 08:01 954799                     /lib/libgcc_s.so.1
7f597b75d000-7f597b764000 r-xp 00000000 08:01 954961                     /lib/libthread_db-1.0.so
7f597b764000-7f597b963000 ---p 00007000 08:01 954961                     /lib/libthread_db-1.0.so
7f597b963000-7f597b964000 r--p 00006000 08:01 954961                     /lib/libthread_db-1.0.so
7f597b964000-7f597b965000 rw-p 00007000 08:01 954961                     /lib/libthread_db-1.0.so
7f597b965000-7f597b966000 ---p 00000000 00:00 0 
7f597b966000-7f597c166000 rw-p 00000000 00:00 0 
7f597c166000-7f597c40a000 r--p 00000000 08:01 49399                      /usr/lib/locale/locale-archive
7f597c40a000-7f597c584000 r-xp 00000000 08:01 954957                     /lib/libc-2.12.1.so
7f597c584000-7f597c783000 ---p 0017a000 08:01 954957                     /lib/libc-2.12.1.so
7f597c783000-7f597c787000 r--p 00179000 08:01 954957                     /lib/libc-2.12.1.so
7f597c787000-7f597c788000 rw-p 0017d000 08:01 954957                     /lib/libc-2.12.1.so
7f597c788000-7f597c78d000 rw-p 00000000 00:00 0 
7f597c78d000-7f597c78f000 r-xp 00000000 08:01 954973                     /lib/libdl-2.12.1.so
7f597c78f000-7f597c98f000 ---p 00002000 08:01 954973                     /lib/libdl-2.12.1.so
7f597c98f000-7f597c990000 r--p 00002000 08:01 954973                     /lib/libdl-2.12.1.so
7f597c990000-7f597c991000 rw-p 00003000 08:01 954973                     /lib/libdl-2.12.1.so
7f597c991000-7f597c9b7000 r-xp 00000000 08:01 954792                     /lib/libexpat.so.1.5.2
7f597c9b7000-7f597cbb7000 ---p 00026000 08:01 954792                     /lib/libexpat.so.1.5.2
7f597cbb7000-7f597cbb9000 r--p 00026000 08:01 954792                     /lib/libexpat.so.1.5.2
7f597cbb9000-7f597cbba000 rw-p 00028000 08:01 954792                     /lib/libexpat.so.1.5.2
7f597cbba000-7f597cc3c000 r-xp 00000000 08:01 954964                     /lib/libm-2.12.1.so
7f597cc3c000-7f597ce3b000 ---p 00082000 08:01 954964                     /lib/libm-2.12.1.so
7f597ce3b000-7f597ce3c000 r--p 00081000 08:01 954964                     /lib/libm-2.12.1.so
7f597ce3c000-7f597ce3d000 rw-p 00082000 08:01 954964                     /lib/libm-2.12.1.so
7f597ce3d000-7f597ce53000 r-xp 00000000 08:01 954914                     /lib/libz.so.1.2.3.4
7f597ce53000-7f597d053000 ---p 00016000 08:01 954914                     /lib/libz.so.1.2.3.4
7f597d053000-7f597d054000 r--p 00016000 08:01 954914                     /lib/libz.so.1.2.3.4
7f597d054000-7f597d055000 rw-p 00017000 08:01 954914                     /lib/libz.so.1.2.3.4
7f597d055000-7f597d095000 r-xp 00000000 08:01 954818                     /lib/libncurses.so.5.7
7f597d095000-7f597d294000 ---p 00040000 08:01 954818                     /lib/libncurses.so.5.7
7f597d294000-7f597d298000 r--p 0003f000 08:01 954818                     /lib/libncurses.so.5.7
7f597d298000-7f597d299000 rw-p 00043000 08:01 954818                     /lib/libncurses.so.5.7
7f597d299000-7f597d2b1000 r-xp 00000000 08:01 954959                     /lib/libpthread-2.12.1.so
7f597d2b1000-7f597d4b0000 ---p 00018000 08:01 954959                     /lib/libpthread-2.12.1.so
7f597d4b0000-7f597d4b1000 r--p 00017000 08:01 954959                     /lib/libpthread-2.12.1.so
7f597d4b1000-7f597d4b2000 rw-p 00018000 08:01 954959                     /lib/libpthread-2.12.1.so
7f597d4b2000-7f597d4b6000 rw-p 00000000 00:00 0 
7f597d4b6000-7f597d4d6000 r-xp 00000000 08:01 954965                     /lib/ld-2.12.1.so
7f597d4df000-7f597d672000 rw-p 00000000 00:00 0 
7f597d672000-7f597d678000 r--p 00a01000 08:06 26722558                   /home/user/workspace/cuda/fullcu/current_debug/Default/shl_3D_cu
7f597d678000-7f597d67e000 r--p 00a15000 08:06 26722558                   /home/user/workspace/cuda/fullcu/current_debug/Default/shl_3D_cu
7f597d67e000-7f597d687000 r--p 00a06000 08:06 26722558                   /home/user/workspace/cuda/fullcu/current_debug/Default/shl_3D_cu
7f597d687000-7f597d6c0000 r--p 009c5000 08:06 26722558                   /home/user/workspace/cuda/fullcu/current_debug/Default/shl_3D_cu
7f597d6c0000-7f597d6c5000 rw-p 00000000 00:00 0 
7f597d6c5000-7f597d6ca000 r--p 009fd000 08:06 26722558                   /home/user/workspace/cuda/fullcu/current_debug/Default/shl_3D_cu
7f597d6cb000-7f597d6cd000 rw-p 00000000 00:00 0 
7f597d6cd000-7f597d6d4000 r--s 00000000 08:01 49265                      /usr/lib/gconv/gconv-modules.cache0x000000000366a8d0 in fdividef<<<(16,1,1),(4,64,1)>>> (Aborted

and cuda-gdb crashes.

Should I take it that there is actually a faulty mem access in my code? or is it the error that hits upon the initialization of cuda-memcheck???

anyone seen that behaviour before??

Thank you for any ideas.

3

3 Answers

1
votes

It looks like you have a memcheck-detectable error in your application. For some reason, cuda-gdb crashes when your application is suspended on the breakpoint.

  1. Does your cuda-gdb crash when you stop on regular breakpoints in the device code?
  2. What CUDA Toolkit and NVIDIA display driver versions are you using? We recommend you trying the latest CUDA Toolkit 5.0RC build as it has numerous stability and functionality improvements.
  3. It would be invaluable if you could contact out team directly at [email protected] to provide more information (this way we may be able to fix the problem in the next CUDA Toolkit version). Can you also provide the application that triggers the crash?

Thank you in advance.

0
votes

Based on Eugene's question on what happens on regular breakpoints, I added some in key points in the code ('key' with respect to the debugging output I have gotten until now) and I have the following 'odd' gdb output to report:

The following is from a CUFFT kernel call:

[Launch of CUDA Kernel 41 (spVector0016B_kernelTex<(fftDirection_t)-1><<<(16,1,1),(4,64,1)>>>) on Device 0]

0x00007ffff569c8b3 in select () from /lib/libc.so.6

(cuda-gdb) n

Single stepping until exit from function select, which has no line number information.

0x00007ffff3fe8de7 in ?? () from /usr/lib/libcuda.so

Is '??' normal output in that case because it's a CUFFT kernel?

The following is from another kernel launch, mine this time and I don't understand why it says 'no such file':

Breakpoint 5, BCSG<<<(1,1,1),(512,1,1)>>> (glerror=0x200620df8, sL=0x2009ffff8, ijL=0x200aa7ffc, X=0x2006035f8, D=0x2006075f8, Pre=0x200941ff8, error=1.00000001e-10, L=700, N=1024, flag=0x200c00000, r=0x2006095f8, r0=0x20060b5f8, p1=0x20060d5f8, p2=0x20060f5f8, vv=0x2006115f8, s1=0x2006135f8, s2=0x2006155f8, t=0x2006175f8, T=0x2006195f8, r1=0x20061b600, a=0x20061cbf8, w=0x20061e1f8, beta=0x20061f7f8, ns1=0x200affffc) at BCGC_solver.cu:96

96 BCGC_solver.cu: No such file or directory.

in BCGC_solver.cu

Also i would like to ask if this:

warning: Warp(s) other than the current warp had to be single-stepped.

...happens because of the execution within cuda-gdb, or if it is something that would happen in normal program execution as well. I do have the program hanging at parts of the code, soon after it freezes for good. But I think that if it was only a matter of serial execution, it would continue soon after, as it does in conventional code.

0
votes

To well trace this issue, it is better to file a bug to Nvidia. The steps of filing a bug is listed as below: 1. Open page http://developer.nvidia.com/cuda/join-cuda-registered-developer-program; 2. If not registered, please click "Join Now", otherwise click "Login Now"; 3. Input e-mail and password to login; 4. On the left panel, there is a "Bug Report" item in Home section, click it to file a bug; 5. Fill the required itmes, other items are optional, but detailed information will help us to target and fix the issue a lot; 6. If necessary, an attachment should be uploaded; 7. For Linux system, it is better to attach an nvidia-bug-report; 8. If an issue is related to specific code pattern, a sample code and instructions to compile it are desired for reproduction.