CUFFT library works in CUDA 3 but gives runtime error invalid value in CUDA 4

Question

I have successfully used the CUFFT library in CUDA 3 but the same code will not run in CUDA 4. With CUDA 4, I get a runtime error (CUDA_INVALID_VALUE) when the FFT executes. This is a forward real-to-complex 1D transform. The only thing I see in the CUFFT documentation that has changed between CUDA 3 and CUDA 4 is the addition of FFTW compatability mode. I am setting this to native mode.

void mexFunction( int nlhs, mxArray *plhs[],
              int nrhs, const mxArray *prhs[])
{
int Nfft, Navg, iAvg, N, n1, n2, Npsd, size[2];

float *hReal;
float *pPxx;

float *dReal;
float *dAvg, *dSum, *dWindow;
float U;
long lAvg, lSum, lWindow;
cufftHandle            hPlan;
cufftComplex *dComplex;
cufftResult result;

int nBlocks, blockSize;

if (nrhs == 12)
{
Nfft =      mxGetScalar(prhs[0]);
blockSize = mxGetScalar(prhs[1]);
Navg =      mxGetScalar(prhs[2]);
iAvg =      mxGetScalar(prhs[3]);
U =         mxGetScalar(prhs[4]);
n1 =        mxGetScalar(prhs[5]);
n2 =        mxGetScalar(prhs[6]);
hPlan =     (cufftHandle)mxGetScalar(prhs[7]);
hReal =     (float *)mxGetData(prhs[8]);
lWindow =   (long)mxGetScalar(prhs[9]);
lAvg =      (long)mxGetScalar(prhs[10]);
lSum =      (long)mxGetScalar(prhs[11]);
}
else
    mexErrMsgTxt("fftcuda: Function requires 12 inputs");

// pointers to GPU arrays
dWindow = (float *)lWindow;
dAvg = (float *)lAvg;
dSum = (float *)lSum;

// size of output array
N = Nfft/2 + 1;
Npsd = n2 - n1 + 1;
size[0] = 1;
size[1] = Npsd;

/* Allocate working arrays on device */
cudaMalloc( (void**)&dReal,sizeof(float)*Nfft);
cudaMalloc( (void**)&dComplex,sizeof(cufftComplex)*N);

/* Copy input array to the device */
cudaMemcpy( (void*)dReal,(void*)hReal,sizeof(float)*Nfft,cudaMemcpyHostToDevice);

// setup for cuda functions
nBlocks = (int)(Nfft/blockSize);

/* multiply input array by window */
cudaMult <<< nBlocks, blockSize >>> (dReal,dWindow,dReal,Nfft);

/* Execute FFT on device */
     result = cufftExecR2C(hPlan, (cufftReal *)dReal, dComplex);

if (result == CUFFT_SETUP_FAILED)
    mexErrMsgTxt("CUFFT library failed to initialize.");
else if (result == CUFFT_INVALID_PLAN )
    mexErrMsgTxt("The hPlan parameter is not a valid handle.");
else if (result == CUFFT_INVALID_VALUE )
    mexErrMsgTxt("The idata or odata parameter is not valid.");
else if (result == CUFFT_EXEC_FAILED )
    mexErrMsgTxt("CUFFT failed to execute the transform on GPU.");

// setup for cuda functions
nBlocks = (int)(Npsd/blockSize) + (Npsd%blockSize);

/* Compute absolute value */
cudaAbs <<< nBlocks, blockSize >>> (&dComplex[n1-1],dReal,Npsd);

if (nlhs != 1)
    mexErrMsgTxt("fftcuda: Function requires 1 output: float pPxx");

plhs[0]=mxCreateNumericArray(2,size,mxSINGLE_CLASS,mxREAL);

pPxx = (float *)mxGetData(plhs[0]);

/* Copy result back to host */
cudaMemcpy( (void*)pPxx, (void*)dReal, sizeof(float)*Npsd,cudaMemcpyDeviceToHost);

/* free working arrays from gpu memory */
cudaFree((void*)dReal);
cudaFree((void*)dComplex);

return;
}

You might have wanted to mention all of this was going on inside a Matlab mex function..... — talonmies

Jack P. Jack P. · Accepted Answer · 2012-09-18T19:21:36

There's not just one single version of the CUFFT library. As CUFFT is part of the CUDA Toolkit, an updated version of the library is released with each new version of the CUDA Toolkit.

If you're trying to use an older copy of the library with a newer version of CUDA, that's almost certainly your problem. Just use the same version of CUFFT as your CUDA Toolkit and it should work.

CUFFT library works in CUDA 3 but gives runtime error invalid value in CUDA 4

1 Answers