CUDA matrix inversion by referencing CUDA-pointer

Question

Currently I'm just trying to implement simple Linear Regression algorithm in matrix-form based on cuBLAS with CUDA. Matrix multiplication and transposition works well with cublasSgemm function.

Problems begins with matrix inversions, based on cublas<t>getrfBatched() and cublas<t>getriBatched() functions (see here).

As it can be seen, input parameters of these functions - arrays of pointers to matrices. Imagine, that I've already allocated memory for (A^T * A) matrix on GPU as a result of previous calculations:

float* dProdATA;
cudaStat = cudaMalloc((void **)&dProdATA, n*n*sizeof(*dProdATA));

Is it possible to run factorization (inversion)

cublasSgetrfBatched(handle, n, &dProdATA, lda, P, INFO, mybatch);

without additional HOST <-> GPU memory copying (see working example of inverting array of matrices) and allocating arrays with single element, but just get GPU-reference to GPU-pointer?

Unknown Unknown · Accepted Answer · 2015-08-22T09:05:48

There is no way around the requirement that the array you pass being in the device address space, and what you posted in your question won't work. You really only have two possibilities:

Allocate an array of pointers on the device and do the memory transfer (the solution you don't want to use).
Use zero-copy or managed host memory to store the batch array

In the latter case with managed memory, something like this should work (completely untested, use at own risk):

float ** batch;
cudaMallocManaged((&batch, sizeof(float *));
*batch = dProdATA;
cublasSgetrfBatched(handle, n, batch, lda, P, INFO, mybatch);

CUDA matrix inversion by referencing CUDA-pointer

1 Answers