Multi threaded vs multi process design approach for cpu intensive applications

Question

We have to design a system that runs parallel algorithms in iterations and sync after certain steps, kind of fork-join model. Sync after few steps is required to exchange data via shared memory to continue the next iterations.
This loop(s) will continue until user specified time. One loop will act as controller to coordinate the sync points(spinlock in our case). Goal is also to run as many iterations as possible (no sleep) in these code path. When we modeled the above behavior in multiple processes vs multiple threads, threads are not scaling as good as processes. This is not a memory intensive application. Both on windows, linux the c++ code shows similar pattern . In first design, Controller is in one application and manages spinlock and other 3 applications are launched waiting for respective spinlock. In second design, same logic is deployed as multiple threads is one application.

Benchmark for our design is to maximize the count of sync point in given time. As I increased numberof processes or threads performance degrades, but threads degrade is bad. Even though 5 cores are 100% loaded, in both cases, threads are bad after number 4. Our plan is to keep 6 threads maximum . To eliminate context switch overhead, boost fibers are tried. But results not promising.

Why threads are not performing on par with multiple processes?

We did tests on intel i7 desktop with same configuration for windows, linux .

Your question does not seem to be about programming, but design or performane. — P5music

David H David H · Accepted Answer · 2019-12-27T16:41:48

You might want to check cache hit rate and context switches.

A process has its own memory space and therefore its own cache region near the processor that it is running on. It may be that threads, since they share memory space, have to deal with the fact that the leading cache is near one processor and further away from the other (L1 hits vs L2 hits vs L3 hits). Not all cache hits are the same.

You may also want to check how many context switches, that is when a process is scheduled and unscheduled, occur. You should want to minimize that.

And then there is the process that a re-scheduled process may end up in the wrong processor, which then may have "the wrong cache" in front of him. Some kernels have an "affinity" function to calculate where a rescheduled process should be located. But that may not work for threads. Not sure there.

Multi threaded vs multi process design approach for cpu intensive applications

1 Answers