0
votes

I wrote a code which is facing kernel launch failure due to Device Illegal Address when I run it using cuda-gdb for a particular input. I ran it using cuda-memcheck and got Invalid write of size 4 error.The code is too big so I will explain the scenario here.

I have a main kernel to which I am passing an array pointer which serves as a stack. I have a device function which is called from the main kernel and uses the stack.

__device__ void find(int v , int* p, int* pv,int n, int* d_stackContents)
{

    int d_stackTop;
    d_stackTop = -1;
    *pv = p[v];
    if(*pv == -1){
            *pv = v;

    }
    else{
    cuPrintf("Stack top is %d\n",d_stackTop);
    d_stackTop = d_stackTop + 1;
    d_stackContents[d_stackTop] = v;
    cuPrintf("Stack top is %d\n",d_stackTop);
    while(*pv != -1){
            d_stackTop = d_stackTop + 1;
            d_stackContents[d_stackTop] = *pv;
            cuPrintf("Stack top is %d\n",d_stackTop);
            *pv = p[*pv];
    }

}

The error is occurring at d_stackContents[d_stackTop] = *pv;

I am calling the device function in the main kernel as follows:

find(v[idx], p,&pv,n, d_stackContents);

where idx = threadIdx.x + blockDim.x * blockIdx.x and I have declared pv as int pv;

Also, the d_stackContents array is allocated in main using cudaMalloc and passed as an argument to the main kernel

1

1 Answers

4
votes

This won't work unless you call your kernel with a single thread in a single block. Otherwise all threads scribble over each other's stack. If you then dereference a pointer that was stored on the corrupted stack, it would immediately explain why your code tries to access an illegal address.

You need to use separate stacks for each thread, or a single stack with a stack pointer in global memory that is manipulated only via atomic operations.