CUDA: illegal combination of memory qualifiers

Question

I have the following code:

main.cu:

#include "class.h"
int main () {}

class.h:

class Class {
    __global__
    void Function() {};
};

When I compile this code using the command nvcc -c main.cu -o main.o, I get the following errors:

class.h(3): warning: inline qualifier ignored for "global" function
class.h(3): error: illegal combination of memory qualifiers

I have a question about each of these errors. Why does it "ignore" the __global__ qualifier for the function, and why is the __global__ memory qualifier illegal in this context? I have read in the documentation that

E.2.10.2. Function Members
Static member functions cannot be __global__ functions.

However, my function is not a static member, as far as I know. Removing the __global__ line allows it to compile, and so does moving the __global__ and void Function(); lines into main.cu. If this actually ISN'T allowed, why does CUDA force this limitation, and what is a way to get around this while still maintaining structured code?

To clarify, I know no other way to make classes that have functions which can create GPU kernels. It seems to me like kernels can only be created from global functions in main.cu. I am fairly new to CUDA programming, so I may just be missing some CUDA conventions which may have been unclear to me. If this is the case, then please let me know so I can keep up with proper programming practice.

The short answer is no, you cannot do this. If you google "cuda global class member" youll find a number of treatments of this, including SO questions like here and here, your question is arguably a duplicate of those. As a simple suggestion, you could wrap your cuda kernels in host-callable class member functions, to " keep up with proper programming practice." — Robert Crovella
Hi @Robert, thank you for your comment. I just want to make sure I understand your suggestion. Are you recommending that I create a __host__function in my class, and the implementation of that function calls a __global__ function? If so, what scope should the __global__ function be in? — Simon Ewing
Yes, to the first question. Not sure I understand the 2nd question. Is there any lack of clarity around what scope the implementation of a class member function should be in? — Robert Crovella
Ok, I may understand now. The __global__ qualifier is not independent from the idea of the global scope. That is, any __global__ function must be a global function, in that it is defined in the global scope. I was under the understanding that __global__ strictly meant that the function was defined on both the host and device, whereas a function in the global scope is accessible from any object on the host. If this explanation is correct, please post your answer as a formal answer so I can accept it. — Simon Ewing

yhf8377 yhf8377 · Accepted Answer · 2016-12-12T14:22:09

My understanding is that you want to use CUDA kernels in an OOP fashion. If this was the case, the class structure below should work:

// myclass.h
class MyClass {
    public:
        void call_kernel( ... );
};

// myclass.cu
__global__
void my_kernel( ... ) {
    // do some work
}

void MyClass::call_kernel() {
    // prepare data for the kernel, e.g. allocating memory, copying from host to device, etc.

    // run kernel
    my_kernel <<< ... >>>( ... );

    // copy results from device to host, clean up, etc.
}

Please note that if you have multiple classes containing kernel code, their source code file should all use .cu extension, and you should enable separate compilation.

CUDA: illegal combination of memory qualifiers

1 Answers