So I was looking up how to do some parallelism just using stl c++ stuff and found the following bit of code on another question here in Stack Overflow
template <typename RAIter> //FOUND ON STACK OVERFLOW
int parallel_sum(RAIter beg, RAIter end)
{
auto len = end - beg;
if (len < 1000)
return std::accumulate(beg, end, 0);
RAIter mid = beg + len / 2;
auto handle = std::async(std::launch::async,
parallel_sum<RAIter>, mid, end);
int sum = parallel_sum(beg, mid);
return sum + handle.get();
}
I wanted to make a general parallel_for_each function that loops over a (hopefully) arbitrary container type and applies an algorithm to each entry so I modified the above to the following:
template <typename ContainerIterator, typename containerSizeType, typename AlgorithmPerEntry> //modified version of parallel sum code above : https://stackguides.com/questions/36246300/parallel-loops-in-c
void parallel_for_each(ContainerIterator beg, ContainerIterator end, AlgorithmPerEntry& algorithm, containerSizeType maxProbSize)
{
containerSizeType len = end - beg;
if (len < maxProbSize){//if you are sufficiently small, go ahead and execute
std::for_each(beg, end, algorithm);
std::cout << "working on processor with id = " << GetCurrentProcessorNumber() << std::endl;//the processor id's change so I'm assuming this is executing in parallel
return;
}
//otherwise, continue spawning more threads
ContainerIterator mid = beg + len / 2;
auto handle = std::async(std::launch::async,
parallel_for_each<ContainerIterator, containerSizeType, AlgorithmPerEntry>, mid, end, algorithm, maxProbSize);
parallel_for_each(beg, mid, algorithm, maxProbSize);
handle.get(); //corrected as advised
}
I wanted to test is with a super simple functor so I made the following:
template<typename T>
struct dataSetter
{
const T& set_to;
dataSetter(const T& set_to_in) : set_to(set_to_in){}
void operator()(T& set_this)
{
set_this = set_to;
}
};
Pretty straight forward, just sets the value of some arg into its operator()
Here's my main function's body
std::vector<int> ints(100000);
unsigned minProbSize = 1000;
int setval = 7;
dataSetter<int> setter(setval);
parallel_for_each(ints.begin(), ints.end(), setter, minProbSize);//parallel assign everything to 7
//some sort of wait function to go here?
std::cout << std::endl << "PS sum of all ints = " << parallel_sum(ints.begin(), ints.end()) << std::endl; //parallel sum the entries
int total = 0;//serial sum the entries
for (unsigned i = 0; i < ints.size(); i++)
total += ints[i];
std::cout << std::endl << "S sum of all ints = " << total << std::endl;
std::cout << std::endl << "PS sum of all ints = " << parallel_sum(ints.begin(), ints.end()) << std::endl; //parallel sum the entries again
Here are some outputs :
PS sum of all ints = 689052
S sum of all ints = 700000
PS sum of all ints = 700000
output from another run:
PS sum of all ints = 514024
S sum of all ints = 700000
PS sum of all ints = 700000
It consistently gets the first parallel sum over the vector low. My guess as to what is happening is that all the assignment threads get created, then the summing threads get created, but certain sum threads are executing prematurely (before the last assignment thread). Is there any way I can force a wait? And as always, I'm open to all advice.
handle.get(). This means you aren't waiting for the async invocation to complete. When the "root"parallel_for_eachcall returns, some "spawned" asynchronous calls may still be running, assigning to the very same elements yourparallel_sumis reading. Your program therefore exhibits undefined behavior by way of a data race. - Igor Tandetnikhandlewould block until the async invocation completes. I guess you are using an older compiler that's conforming to C++11 but not C++14. - Igor Tandetnikhandle.get()has a side effect of blocking until the asynchronous operation completes (its primary effect is to obtain the return value of said asynchronous operation, but yours doesn't return any). You want to call it after the synchronous call - basically, a) start async, b) do something else for a while, c) wait for (a). If you swap (b) and (c), you defeat the purpose. - Igor Tandetnikstd::async. They fixed it in MSVS2015 I think. - Yakk - Adam Nevraumont~futureblock for anasyncmanufactured future. That's quite subtle - it takes some work to put all the pieces together. I guess it's a good thing that C++14 added a clarification. - Igor Tandetnik