3
votes

As I know, we can allocate a Pinned memory area within kernel memory. (From KGPU)

Then, allocate linux kernel data in Pinned memory and transfer to GPU.

But problem is that linux kernel data should be arranged as array.

Today, a case that is a tree.

I have tried pass it from Pinned memory to GPU.

But when a node access next node, memory access error occured.

I was wondering is Unified Memory can be allocated as Pinned memory area in kernel memory?

So tree can be builded in Unified Memory area and used by GPU without other runtime API like cudaMallocMaganed.

Or is that Unified memory must only use cudaMallocMaganed?

1

1 Answers

1
votes

But when a node access next node, memory access error occurred.

This just means you have a bug in your code.

Or is that Unified memory must only use cudaMallocManaged?

Currently, the only way to access the features of Unified Memory is to use a managed allocator. For dynamic allocations, that is cudaMallocManaged(). For static allocations, it is via the __managed__ keyword.

The programming guide has additional information.

In response to the comments below, here is a trivial worked example of creating a singly-linked list using pinned memory, and traversing that list in device code:

$ cat t1115.cu
#include <stdio.h>
#define NUM_ELE 5

struct ListElem{

   int id;
   bool last;
   ListElem *next;
};

__global__ void test_kernel(ListElem *list){

  int count = 0;
  while (!(list->last)){
    printf("List element %d has id %d\n", count++, list->id);
    list = list->next;}
  printf("List element %d is the last item in the list\n", count);
}

int main(){
  ListElem *h_list, *my_list;
  cudaHostAlloc(&h_list, sizeof(ListElem), cudaHostAllocDefault);
  my_list = h_list;
  for (int i = 0; i < NUM_ELE-1; i++){
    my_list->id = i+101;
    my_list->last = false;
    cudaHostAlloc(&(my_list->next), sizeof(ListElem), cudaHostAllocDefault);
    my_list = my_list->next;}
  my_list->last = true;
  test_kernel<<<1,1>>>(h_list);
  cudaDeviceSynchronize();
}

$ nvcc -o t1115 t1115.cu
$ cuda-memcheck ./t1115
========= CUDA-MEMCHECK
List element 0 has id 101
List element 1 has id 102
List element 2 has id 103
List element 3 has id 104
List element 4 is the last item in the list
========= ERROR SUMMARY: 0 errors
$

Note that in the interest of brevity of presentation, I have dispensed with proper CUDA error checking in this example (although running the code with cuda-memcheck demonstrates there are no CUDA run-time errors), but I recommend it any time you are having trouble with a CUDA code. Also note that this example assumes a proper UVA environment.