But when a node access next node, memory access error occurred.
This just means you have a bug in your code.
Or is that Unified memory must only use cudaMallocManaged?
Currently, the only way to access the features of Unified Memory is to use a managed allocator. For dynamic allocations, that is cudaMallocManaged()
. For static allocations, it is via the __managed__
keyword.
The programming guide has additional information.
In response to the comments below, here is a trivial worked example of creating a singly-linked list using pinned memory, and traversing that list in device code:
$ cat t1115.cu
#include <stdio.h>
#define NUM_ELE 5
struct ListElem{
int id;
bool last;
ListElem *next;
};
__global__ void test_kernel(ListElem *list){
int count = 0;
while (!(list->last)){
printf("List element %d has id %d\n", count++, list->id);
list = list->next;}
printf("List element %d is the last item in the list\n", count);
}
int main(){
ListElem *h_list, *my_list;
cudaHostAlloc(&h_list, sizeof(ListElem), cudaHostAllocDefault);
my_list = h_list;
for (int i = 0; i < NUM_ELE-1; i++){
my_list->id = i+101;
my_list->last = false;
cudaHostAlloc(&(my_list->next), sizeof(ListElem), cudaHostAllocDefault);
my_list = my_list->next;}
my_list->last = true;
test_kernel<<<1,1>>>(h_list);
cudaDeviceSynchronize();
}
$ nvcc -o t1115 t1115.cu
$ cuda-memcheck ./t1115
========= CUDA-MEMCHECK
List element 0 has id 101
List element 1 has id 102
List element 2 has id 103
List element 3 has id 104
List element 4 is the last item in the list
========= ERROR SUMMARY: 0 errors
$
Note that in the interest of brevity of presentation, I have dispensed with proper CUDA error checking in this example (although running the code with cuda-memcheck
demonstrates there are no CUDA run-time errors), but I recommend it any time you are having trouble with a CUDA code. Also note that this example assumes a proper UVA environment.