Use of cudamalloc(). Why the double pointer?

Question

I am currently going through the tutorial examples on http://code.google.com/p/stanford-cs193g-sp2010/ to learn CUDA. The code which demostrates __global__ functions is given below. It simply creates two arrays, one on the CPU and one on the GPU, populates the GPU array with the number 7 and copies the GPU array data into the CPU array.

#include <stdlib.h>
#include <stdio.h>

__global__ void kernel(int *array)
{
  int index = blockIdx.x * blockDim.x + threadIdx.x;

  array[index] = 7;
}

int main(void)
{
  int num_elements = 256;

  int num_bytes = num_elements * sizeof(int);

  // pointers to host & device arrays
  int *device_array = 0;
  int *host_array = 0;

  // malloc a host array
  host_array = (int*)malloc(num_bytes);

  // cudaMalloc a device array
  cudaMalloc((void**)&device_array, num_bytes);

  int block_size = 128;
  int grid_size = num_elements / block_size;

  kernel<<<grid_size,block_size>>>(device_array);

  // download and inspect the result on the host:
  cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);

  // print out the result element by element
  for(int i=0; i < num_elements; ++i)
  {
    printf("%d ", host_array[i]);
  }

  // deallocate memory
  free(host_array);
  cudaFree(device_array);
}

My question is why have they worded the cudaMalloc((void**)&device_array, num_bytes); statement with a double pointer? Even here definition of cudamalloc() on says the first argument is a double pointer.

Why not simply return a pointer to the beginning of the allocated memory on the GPU, just like the malloc function does on the CPU?

Because it returns an error code that tells you why it failed. Returning a null pointer on failure, like malloc(), is a poor substitute for an error code, doesn't mean anything more than "it didn't work". You are supposed to check it. — Hans Passant
@Hans: It's still a horrible API design. Instead, it should take an extra int *error argument to store the error code, which will be valid when the return value is a null pointer. As-is, the design negates all benefits of void pointers and requires you to jump through hoops to use the function correctly. — R.. GitHub STOP HELPING ICE
@R.: you get credit for offering an alternative at the same time you criticize the API - most API critics winge without proposing alternatives - but unless you believe that every CUDA runtime call should take an additional int * to pass back an error code, (which would make for a much more cluttered and difficult-to-use API), your alternative proposal is not orthogonal and violates the principle of least astonishment. — ArchaeaSoftware
Violating the principle of least astonishment is a much smaller offense than requiring temp void *'s all over the place. The int *error could be null when the user does not care about the reason. Actually I see no reason allocation could fail other than "out of memory" (and more importantly no reason the caller could care why it failed), so it's probably just a design mistake to begin with. — R.. GitHub STOP HELPING ICE
possible duplicate of Why does cudaMalloc() use pointer to pointer? — chappjc

CygnusX1 CygnusX1 · Accepted Answer · 2011-11-03T07:19:47

All CUDA API functions return an error code (or cudaSuccess if no error occured). All other parameters are passed by reference. However, in plain C you cannot have references, that's why you have to pass an address of the variable that you want the return information to be stored. Since you are returning a pointer, you need to pass a double-pointer.

Another well-known function which operates on addresses for the same reason is the scanf function. How many times have you forgotten to write this & before the variable that you want to store the value to? ;)

int i;
scanf("%d",&i);

Use of cudamalloc(). Why the double pointer?

5 Answers

how the double pointer works

Why do we need the double pointer? Why this does work