I tried the code in this link Is CUDA pinned memory zero-copy? The one who asked claims the program worked fine for him But does not work the same way on mine the values does not change if I manipulate them in the kernel.
Basically my problem is, my GPU memory is not enough but I want to do calculations which require more memory. I my program to use RAM memory, or host memory and be able to use CUDA for calculations. The program in the link seemed to solve my problem but the code does not give output as shown by the guy.
Any help or any working example on Zero copy memory would be useful.
Thank you
__global__ void testPinnedMemory(double * mem)
{
double currentValue = mem[threadIdx.x];
printf("Thread id: %d, memory content: %f\n", threadIdx.x, currentValue);
mem[threadIdx.x] = currentValue+10;
}
void test()
{
const size_t THREADS = 8;
double * pinnedHostPtr;
cudaHostAlloc((void **)&pinnedHostPtr, THREADS, cudaHostAllocDefault);
//set memory values
for (size_t i = 0; i < THREADS; ++i)
pinnedHostPtr[i] = i;
//call kernel
dim3 threadsPerBlock(THREADS);
dim3 numBlocks(1);
testPinnedMemory<<< numBlocks, threadsPerBlock>>>(pinnedHostPtr);
//read output
printf("Data after kernel execution: ");
for (int i = 0; i < THREADS; ++i)
printf("%f ", pinnedHostPtr[i]);
printf("\n");
}
8*sizeof(double)
bytes – ebarrsize_t
does not do that. – Robert Crovella