0
votes

I am trying to parallelize a C function using CUDA. I noticed that there are several structs which are being passed as pointers to this function. With the unified memory view, I have identified and modified malloc() to cudaMallocManaged().

But, now there is a allocation using memalign(). I want to achieve a similar task as that was done by cudaMallocManaged().

Does such an equivalent exists ? If no, then what needs to be done?

This is how the memalign() allocation line looks:

float *data = (float*) memalign(16, some_integer*sizeof(float));
1
According to the cuda c programming guide memory allocated with cuda allocation functions is always aligned to at least 256 bytes. As far as I know you cannot specify other alignments. - havogt

1 Answers

2
votes

You should be able to register an existing host memory buffer like this:

float *data = (float*) memalign(16, some_integer*sizeof(float));
cudaHostRegister((void *)data, some_integer*sizeof(float), cudaHostRegisterDefault);

after registration data should behave the same as memory allocated with cudaMallocManaged. Check the return value from the cudaHostRegister call, if it fails, you have chosen an incompatible alignment.