Or how to convert Parfor loops to gpuArray operations ?
Problem at hand : A column-wise operation on large data matrix distributed to multiple CPUs (workers) via parfor.
data = 1000 x 200 matrix
[nrows, ncols] = size(data)
parfor ix = 1:ncols
workerData = data(:,ix);
colwiseResult(ix) = function(workerData,params);
end
Now, how could this be efficiently made to utilize GPUs ? especially since the problem needs to scale up to 1000x1000 matrices (& more )
dataGPU = gpuArray(data);
But I don't find an easier way of doing column wise operations on GPU. Functions such as arrayfun or bsxfun operate element-by-element which is not what am interested in. Since these are trivially parallel tasks, exploiting multiple processors in the GPUs could be ideal (& avoids bothering with the parfor etc., )
(Well actually, am performing a likelihood calculation that does matrix diagonalization, exponents & products progressively over each data point on the column vector using a for loop inside the Workers). Have checked that all these ops have gpu overloading )