I'm trying to run a cublas function from within a kernel in the following way:
__device__ void doLinear(const float *W,const float *input, unsigned i, float *out, unsigned o) {
unsigned idx = blockIdx.x*blockDim.x+threadIdx.x;
const float alpha = 1.0f;
const float beta = 0.0f;
if(idx == 0) {
cublasHandle_t cnpHandle;
cublasStatus_t status = cublasCreate(&cnpHandle);
cublasSgemv(cnpHandle, CUBLAS_OP_N, o, i, &alpha, W, 1, input, 1, &beta, out, 1);
}
__syncthreads();
}
This function works perfectly well if the input
pointer is allocated using cudaMalloc.
My issue is, if the input
pointer actually points to some shared memory, that contains data generated from within the kernel, I get the error:
CUDA_EXCEPTION_14 - Warp Illegal address
.
Is it not possible to pass pointers to shared memory to a cublas function being called from a kernel?
What is the correct way to allocate my memory here? (At the moment I'm just doing another cudaMalloc and using that as my 'shared' memory, but it's making me feel a bit dirty)