The first command, straight from the manual works, but the second does not seem to recognize the .cc file as cuda even though I have -xcuda flag.
clang++ apxy.cu --cuda-gpu-arch=sm_61 -L/usr/local/cuda/lib64 -lcudart_static -ldl -lrt -pthread
clang++ apxy.cc -xcuda --cuda-gpu-arch=sm_61 -L/usr/local/cuda/lib64 -lcudart_static -ldl -lrt -pthread
apxy.cc:3:1: error: unknown type name '__global__'__global__ void axpy(float a, float* x, float* y) { │ at io.iohk.ethereum.mallet.main.Shell.<init>(Shell.scala:18)
^ │ at io.iohk.ethereum.mallet.main.Mallet$.delayedEndpoint$io$iohk$ethereum$mallet$main$Mallet$1(Mallet.scala:20)
apxy.cc:3:12: error: expected unqualified-id │ at io.iohk.ethereum.mallet.main.Mallet$delayedInit$body.apply(Mallet.scala:13)
__global__ void axpy(float a, float* x, float* y) { │ at scala.Function0.apply$mcV$sp(Function0.scala:34)
^ │ at scala.Function0.apply$mcV$sp$(Function0.scala:34)
apxy.cc:17:3: error: use of undeclared identifier 'cudaMalloc' │ at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
cudaMalloc(&device_x, kDataLen * sizeof(float));