I'm trying to make a for loop multi-threaded in C++ so that the calculation gets divided to the multiple threads. Yet it contains data that needs to be joined together in the order as they are.
So the idea is to first join the small bits on many cores (25.000+ loops) and then join the combined data once more at the end.
std::vector<int> ids; // mappings
std::map<int, myData> combineData; // data per id
myData outputData; // combined data based on the mappings
myData threadData; // data per thread
#pragma parallel for default(none) private(data, threadData) shared(combineData)
for (int i=0; i<30000; i++)
{
threadData += combineData[ids[i]];
}
// Then here I would like to get all the seperate thread data and combine them in a similar manner
// I.e.: for each threadData: outputData += threadData
What would be the efficient and good way to approach this?
How can I schedule the openmp loop so that the scheduling is split evenly into chunks
For example for 2 threads: [0, 1, 2, 3, 4, .., 14999] & [15000, 15001, 15002, 15003, 15004, .., 29999]
If there's a better way to join the data (which involves joining a lot of std::vectors together and some matrix math), yet preserve the order of additions pointers to that would help as well.
Added information
- The addition is associative, though not commutative.
- myData is not an intrinsic type. It's a class containing data as multiple std::vectors (and some data related to the Autodesk Maya API.)
- Each cycle is doing a similar matrix multiplication to many points and adds these points to a vector (in theory the calculation time should stay roughly similar per cycle)
Basically it's adding mesh data (consisting of vectors of data) to eachother (combining meshes) though the order of the whole thing accounts for the index value of the vertices. The vertex index should be consistent and rebuildable.
myData
associative i.e. does(A + B) + C
= A + (B + C)`? That's going to effective whether or not you can parallelize this. See my answer. – Z boson