I seem to be one of few people using the Matlab coder (codegen command) to get speedup, judging by the fact that there is so little discussion or help on-line. I've gotten incredible speedups from it in some cases. I've never seen it documented, but when I make a MEX file using codegen from a Matlab script with a parfor loop, it often will thread the resulting MEX. Parfor in functions spawns multiple processes which is often less efficient than just threading (I'm inferring all this from watching top in linux and seeing multiple 100% processes in Matlab functions, but a single e.g. 1000% process when running the converted MEX). I'm working on a case now where I could really use the speedup, but I see no evidence of multiple threads being used in the MEX even though parfor is working in the base function. Anyone know what the hangup might be, or how the coder chooses when to thread?
1 Answers
0
votes
It will only thread the parfor loop itself, it would be dangerous for the coder to guess, and impossible to calculate where there is appropriate parallelism.
If I were you, I would try to put parfor in place of anywhere in the Matlab code that I could.
And now how to determine whether a loop is acceptable to parallelize:
- Does it use any results from a previous calculation, if so, then don't try, seriously, it will only make it worse
Does it use IO in any form, if so, then don't, it will slow it down and remove any determinism from the code
Is there a loop for parfor to replace? If not, then you'll have to deal with the performance because there might not be anything to parallelize.
parforin MALTAB runs on background worker processes. MATLAB Coder will convertparfor-loops into multithreaded C/C++ code using OpenMP (search for#pragma ompin the generated code): mathworks.com/help/coder/ref/parfor.html, mathworks.com/help/coder/ug/… - AmroNumThreadsinput toparfor. However, as far as I know it's not documented how the number of threads up to that maximum is chosen. Perhaps @Edric would know, if he's listening? - Sam Robertssetenv('OMP_NUM_THREADS','8')before running the compiled MEX-function. Note that this might affect other builtin functions as well that are also multithreaded (I think Intel MKL providing BLAS/LAPACK/FFT routines is affected) - Amro