0
votes

I made a quick program to make sure I could use the cufft library correctly. When I run a batch size of "1" I get the result I expect. However, as I increase the batch size, I get what appears to be random bytes at the end of my data buffer. If the batch size is 2, the last three entries are noise. If the batch size is 3, I get noise in the last six entries at the end of the buffer, as well as in the three entries at the end of what should be the results from the second of the three transforms in the batch.

Example of bad data at the end of the results of the second transform in a batch:

7.680291 1.411589 <- good data
7.748493 1.062853
7.797380 0.710554
7.826757 0.355854
-436781318144.000000 -436781318144.000000 <- start of bad results
5349828096.000000 5000401408.000000
5511789568.000000 4813803008.000000
5664713728.000000 4619900416.000000
<- end of output

Code:

#define NX 1024
#define BATCH 4

#include <cuda.h>
#include <cufft.h>
#include <stdio.h>
#include <Windows.h>
#include <math.h>

int main()
{
    cufftHandle plan;
    cufftComplex *deviceData;
    cufftComplex *hostData;
    FILE* output;
    char fileName[256];

    int i, j;

    cudaMalloc((void**)&deviceData, NX * BATCH * sizeof(cufftComplex));
    hostData = (cufftComplex*)malloc(NX * BATCH * sizeof(cufftComplex);

    //Initalize array with a real sin wave, increasing the frequency of the wave for each transform in the batch (indexed by "j")
    for (j = 0; j < BATCH; j++)
    {
        for (i = 0; i < NX; i++)
        {
            hostData[i + j*BATCH].x = sin(i*(j+1) / (float)10);
            hostData[i + j*BATCH].y = 0;
        }
    }

    cudaMemcpy(deviceData, hostData, NX * BATCH * sizeof(cufftComplex), cudaMemcpyHostToDevice);
    cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH);
    cufftExecC2C(plan, deviceData, deviceData, CUFFT_FORWARD);
    cudaThreadSynchronize();
    cudaMemcpy(hostData, deviceData, NX * BATCH * sizeof(cufftComplex), cudaMemcpyDeviceToHost);
    cufftDestroy(plan);
    cudaFree(deviceData);

    output = fopen("outputFile.txt", "w");

    //Write one file for each transform in the batch
    for (j = 0; j < BATCH; j++)
    {
        memset(fileName, '\0', 256);
        sprintf(fileName, "outputFile_%d.txt", j);
        output = fopen(fileName, "w");
        for (i = 0; i < NX; i++)
            fprintf(output, "%f\t%f\n", hostData[i + j*BATCH].x, hostData[i + j*BATCH].y);
        fclose(output);
    }
}
1

1 Answers

2
votes

you're mixing up the usage of BATCH and NX to index into your data sets.

I think your final fprintf line should be this instead of what you have:

fprintf(output, "%f\t%f\n", hostData[i + j*NX].x, hostData[i + j*NX].y);

Likewise you need to change your data setup lines from

hostData[i + j*BATCH]...

to

hostData[i + j*NX]...

(2 instances.)

And while we're at it, this line does not compile for me, it's missing a close parenthesis:

hostData = (cufftComplex*)malloc(NX * BATCH * sizeof(cufftComplex);