3
votes

I have an application written that allocates a matrix and a vector on the device using cudaMalloc/cudaMemcpy. The matrix is defined column-major. I would like to use a function from the cublas library (cublasSgemv) to multiply these together now. It appears that I will have to allocate duplicates of the the matrix and vector using cudaMalloc and initialize them from the host with cublasSetMatrix/cublasSetVector in order to use the cublas API function. Obviously duplicating all of this memory is going to be costly.

To my understanding, the cublasSetMatrix/cublasSetVector functions are just light wrappers of cudaMemCpy. I was wondering if it is possible to pass the pointers to the arrays initialized with cudaMemCpy to the cublas API function? Or, is it otherwise possible to lightly wrap the arrays in a way that the API will recognize, so that I can avoid all of the memory duplication?

1
It is too expensive to treat matrices as plain arrays for your purposes?Alejandro Sazo

1 Answers

3
votes

Yes you can use cudaMemcpy instead of cublasGet/SetMatrix. CUBLAS will work with that as well.