I've been calling this in OpenMP
#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i<totalThreads; i++)
{
workOnTheseEdges(startIndex[i], endIndex[i]);
}
And this in C++11 std::threads (I believe those are just pthreads)
vector<thread> threads;
for(unsigned i=0; i<totalThreads; i++)
{
threads.push_back(thread(workOnTheseEdges,startIndex[i], endIndex[i]));
}
for (auto& thread : threads)
{
thread.join();
}
But, the OpenMP implementation is 2x the speed--Faster! I would have expected C++11 threads to be faster, as they are more low-level. Note: The code above is being called not just once, but probably 10,000 times in a loop, so maybe that has something to do with it?
Edit: for clarification, in practice, I either use the OpenMP or the C++11 version--not both. When I am using the OpenMP code, it takes 45 seconds and when I am using the the C++11, it takes 100 seconds.
totalThreadsis, how many cores/HW threads your CPU has, what the size ofstartIndexis and how much time it takes to executeworkOnTheseEdges()once. - Hristo Ilievstd::threadversion createstotalThreadsnew threads, but theOpenMPis reusing the same 16 each time the loop executes. - Mooing Duckstd::asyncto reuse the same threads. Other than that you'd have to manage the 16 threads yourself (which is pretty easy for your case. Make thethreadsvector static: coliru.stacked-crooked.com/a/3fdad471c0c26d41) - Mooing Duck