1
votes

I seem to be one of few people using the Matlab coder (codegen command) to get speedup, judging by the fact that there is so little discussion or help on-line. I've gotten incredible speedups from it in some cases. I've never seen it documented, but when I make a MEX file using codegen from a Matlab script with a parfor loop, it often will thread the resulting MEX. Parfor in functions spawns multiple processes which is often less efficient than just threading (I'm inferring all this from watching top in linux and seeing multiple 100% processes in Matlab functions, but a single e.g. 1000% process when running the converted MEX). I'm working on a case now where I could really use the speedup, but I see no evidence of multiple threads being used in the MEX even though parfor is working in the base function. Anyone know what the hangup might be, or how the coder chooses when to thread?

1
parfor in MALTAB runs on background worker processes. MATLAB Coder will convert parfor-loops into multithreaded C/C++ code using OpenMP (search for #pragma omp in the generated code): mathworks.com/help/coder/ref/parfor.html, mathworks.com/help/coder/ug/… - Amro
You can specify a maximum number of threads using the NumThreads input to parfor. However, as far as I know it's not documented how the number of threads up to that maximum is chosen. Perhaps @Edric would know, if he's listening? - Sam Roberts
@SamRoberts: you can use environment variables to control the max number of threads. Try setting setenv('OMP_NUM_THREADS','8') before running the compiled MEX-function. Note that this might affect other builtin functions as well that are also multithreaded (I think Intel MKL providing BLAS/LAPACK/FFT routines is affected) - Amro

1 Answers

0
votes

It will only thread the parfor loop itself, it would be dangerous for the coder to guess, and impossible to calculate where there is appropriate parallelism.

If I were you, I would try to put parfor in place of anywhere in the Matlab code that I could.

And now how to determine whether a loop is acceptable to parallelize:

  1. Does it use any results from a previous calculation, if so, then don't try, seriously, it will only make it worse
  2. Does it use IO in any form, if so, then don't, it will slow it down and remove any determinism from the code

  3. Is there a loop for parfor to replace? If not, then you'll have to deal with the performance because there might not be anything to parallelize.