0
votes

I am using the CUDA API / cuFFT API. In order to move data from host to GPU I am usign the cudaMemcpy functions. I am using it like below. len is the amount of elements on dataReal and dataImag.

void foo(const double* dataReal, const double* dataImag, size_t len)
{
    cufftDoubleComplex* inputData;
    size_t allocSizeInput = sizeof(cufftDoubleComplex)*len;
    cudaError_t allocResult = cudaMalloc((void**)&inputData, allocSizeInput);

    if (allocResult != cudaSuccess) return;

    cudaError_t copyResult;

    coypResult = cudaMemcpy2D(static_cast<void*>(inputData),
                              2 * sizeof (double),
                              static_cast<const void*>(dataReal),
                              sizeof(double),
                              sizeof(double),
                              len,
                              cudaMemcpyHostToDevice);

    coypResult &= cudaMemcpy2D(static_cast<void*>(inputData) + sizeof(double),
                              2 * sizeof (double),
                              static_cast<const void*>(dataImag),
                              sizeof(double),
                              sizeof(double),
                              len,
                              cudaMemcpyHostToDevice);

    //and so on.
}

I am aware, that pointer arithmetic on void pointers is actually not possible. the second cudaMemcpy2D does still work though. I still get a warning by the compiler, but it works correctly.

I tried using static_cast< char* > but that doesn't work as cuffDoubleComplex* cannot be static casted to char*.

I am a bit confused why the second cudaMemcpy with the pointer arithmetic on void is working, as I understand it shouldn't. Is the compiler implicitly assuming that the datatype behind void* is one byte long?

Should I change something there? Use a reinterpret_cast< char* >(inputData) for example?

Also during the allocation I am using the old C-style (void**) cast. I do this because I am getting a "invalid static_cast from cufftDoubleComplex** to void**". Is there another way to do this correctly?

FYI: Link to cudaMemcpy2D Doc

Link to cudaMalloc Doc

1
Try static_cast<void*>(&(inputData->y)) (instead of + ...) and use sizeof(cufftDoubleComplex) instead of 2 * sizeof(cufftDoubleComplex) (even it is the same value, first one is more generic).Holt
It's not clear why you feel the need to cast anything. cudaMalloc does not require that you cast to void ** and niether does cudaMemcpy2D require you to cast to void *.Robert Crovella
cudaMalloc expects a void** and cudaMemcpy2D expects a void*. I know for a fact, that both of them work on bytes and not on types. I actually would like to have a char* but this is not what the CUDA API wants me to do.FreddyKay
You don't need to do any casting. (Try it.) Just pass whatever pointer or computed pointer (e.g. &(double *) ) you have computed, to cudaMalloc. Likewise for cudaMemcpy (i.e. double *) Even if you were going to use a cast (again, unnecessary) you should do all your pointer arithmetic first, in whatever type is relevant (e.g. double *) then cast as the final step. This would completely avoid any pointer arithmetic using void *.Robert Crovella
In that case, the pointer are casted implicitly aren't they? I to be honest prefer doing it manually, in order to have all types clear in my code, so anyone seeing it, can immediatly see what is done. At the same time it does indeed make sense to do arithmetic before casting.FreddyKay

1 Answers

1
votes

You cannot do arithmetic operations on void* since arithmetic operations on pointer are based on the size of the pointed objects (and sizeof(void) does not really mean anything).

Your code compiles probably thanks to a compiler extension that treats arithmetic operations on void* as arithmetic operation on char*.

In your case, you probably do not need arithmetic operations, the following should work (and be more robust):

coypResult &= cudaMemcpy2D(static_cast<void*>(&inputData->y),
                           sizeof (cufftDoubleComplex),

Since cufftDoubleComplex is simply:

struct __device_builtin__ __builtin_align__(16) double2
{
    double x, y;
};