2
votes

I need to run several independent analyses on the same data set. Specifically, I need to run bunches of 100 glm (generalized linear models) analyses and was thinking to take advantage of my video card (GTX580).

As I have access to Matlab and the Parallel Computing Toolbox (and I'm not good with C++), I decided to give it a try.

I understand that a single GLM is not ideal for parallel computing, but as I need to run 100-200 in parallel, I thought that using parfor could be a solution.

My problem is that it is not clear to me which approach I should follow. I wrote a gpuArray version of the matlab function glmfit, but using parfor doesn't have any advantage over a standard "for" loop.

Has this anything to do with the matlabpool setting? It is not even clear to me how to set this to "see" the GPU card. By default, it is set to the number of cores in the CPU (4 in my case), if I'm not wrong. Am I completely wrong on the approach?

Any suggestion would be highly appreciated.

Edit

Thanks. I'm aware of GPUmat and Jacket, and I could start writing in C without too much effort, but I'm testing the GPU computing possibilities for a department where everybody uses Matlab or R. The final goal would be a cluster based on C2050 and the Matlab Distribution Server (or at least this was the first project). Reading the ADs from Mathworks I was under the impression that parallel computing was possible even without C skills. It is impossible to ask the researchers in my department to learn C, so I'm guessing that GPUmat and Jacket are the better solutions, even if the limitations are quite big and the support to several commonly used routines like glm is non-existent.

How can they be interfaced with a cluster? Do they work with some job distribution system?

2

2 Answers

4
votes

I would recommend you try either GPUMat (free) or AccelerEyes Jacket (buy, but has free trial) rather than the Parallel Computing Toolbox. The toolbox doesn't have as much functionality.

To get the most performance, you may want to learn some C (no need for C++) and code in raw CUDA yourself. Many of these high level tools may not be smart enough about how they manage memory transfers (you could lose all your computational benefits from needlessly shuffling data across the PCI-E bus).

2
votes

Parfor will help you for utilizing multiple GPUs, but not a single GPU. The thing is that a single GPU can do only one thing at a time, so parfor on a single GPU or for on a single GPU will achieve the exact same effect (as you are seeing).

Jacket tends to be more efficient as it can combine multiple operations and run them more efficiently and has more features, but most departments already have parallel computing toolbox and not jacket so that can be an issue. You can try the demo to check.

No experience with gpumat.

The parallel computing toolbox is getting better, what you need is some large matrix operations. GPUs are good at doing the same thing multiple times, so you need to either combine your code somehow into one operation or make each operation big enough. We are talking a need for ~10000 things in parallel at least, although it's not a set of 1e4 matrices but rather a large matrix with at least 1e4 elements.

I do find that with the parallel computing toolbox you still need quite a bit of inline CUDA code to be effective (it's still pretty limited). It does better allow you to inline kernels and transform matlab code into kernels though, something that