I have a simple Metal compute kernel that I'm trying to make a Cuda equivalent of. The Metal kernel source is
#include <metal_stdlib>
using namespace metal;
constant uint stride [[function_constant(0)]];
constant float dt [[function_constant(1)]];
constant float a [[function_constant(2)]];
constant float b [[function_constant(3)]];
float2 f(const float2 x) {
return float2(a, -b)*x.yx;
}
kernel void harmonic_occilator_stride(device float2 *x [[buffer(0)]],
uint i [[thread_position_in_grid]]) {
for (uint j = 0; j < stride; j++) {
x[i] += dt*f(x[i]);
}
}
My first attempt at converting this to Cuda results in a bunch of errors when compiling the ptx file.
__constant__ uint stride;
__constant__ float dt;
__constant__ float a;
__constant__ float b;
__device__ float2 f(const float2 x) {
return float2(a, -b)*x.yx;
}
extern "C" __global__ void harmonic_occilator_stride(float2 *x) {
size_t i = blockIdx.x*blockDim.x + threadIdx.x;
for (uint j = 0; j < stride; j++) {
x[i] += dt*f(x[i]);
}
}
The first thing it doesn't like is x.yx
. In Metal, this reverses the order of the contents of the float2. How do I reverse or change the access order of a vector in Cuda?
The next thing it also doesn't like float2(a, -b)
. This gives an error "no suitable constructor to convert between float
and float2
". How do I construct a vector literal?
The last thing it complains about is that there is no *
operator for float
and float2
for the line dt*f(x[i])
. If I remove the dt*
and just set it to x[i] += f(x[i])
it complains there is no +=
operator for float2
and float2
. How do I perform operations on these types and can I multiply vectors and scalars?
In Metal when I define a functions as function_constant
, the Metal kernel compiler will JIT a specific optimized version of the kernel when the kernel function is loaded at runtime. Does Cuda have this functionality?
extern "C"
. If you want C-style linkability, I recommend creating ordinary (i.e. not decorated with__device__
or__global__
) wrapper functions, that call the kernel for you. Define those wrapper functions in the same module as the kernel definition, and call those from other modules that need C-linkability. – Robert Crovellaj
. But that is neither here nor there. – Robert Crovella