3
votes

As I know there is two way for parallel processing Message passing interface and multi threading. Multi threading can not be used for distributed memory systems without message passing interface; but message passing interface can be used for either systems "shared memory" and "distributed memory". My question is about performance of a code that is parallelized with MPI and ran on shared memory system. Is the performance of this code in the same range of a code that is parallelized with multi threading?

Update:

My job is in the for that processes need to communicate with each other in repeatedly and the communication array can be 200*200 matrix

3
Do you know about OpenMP? There are many other parallelization schemes out there too. - betabandido
Yes I know; but the base of all methods divided between two way multi threading and message passing. - peaceman

3 Answers

2
votes

Let's assume we only consider MPI and OpenMP, since they are the two major representatives of the two parallel programming families you mention. For distributed systems, MPI is the only option between different nodes. Within a single node, however, as you well say, you can still use MPI and use OpenMP too. Which one will perform better really depends on the application you are running, and specifically in its computation/communication ratio. Here you can see a comparison of MPI and OpenMP for a multicore processor, where they confirm the same observation.

You can go a step further and use a hybrid approach. Use MPI between the nodes and then use OpenMP within nodes. This is called hybrid MPI+OpenMP parallel programming. You can also apply this within a node that contains a hybrid CMP+SMT processor.

You can check some information here and here. Moreover this paper compares an MPI approach vs a hybrid MPI+OpenMP one.

3
votes

The answer is: it depends. MPI proceses are predominantly separate OS processes and communication between them occurs with some sort of shared memory IPC techniques when the communicating processes run on the same shared-memory node. Being separate OS processes, MPI processes in general do not share data and sometimes data has to be replicated in each process which leads to less than optimal memory usage. On the other hand threads can share lots of data and can benefit from cache reusage, especially on multicore CPUs that have large shared last-level caches (e.g. the L3 cache on current generation x86 CPUs). Cache reusage when combined with more lightweight methods for data exchange between threads (usually just synchronisation since work data is already shared) can lead to better performance than the one achievalbe by separate processes.

But once again - it depends.

0
votes

In my opinion, they're simply better at different jobs. The Actor model is great at asynchronously performing many different tasks at different times, whereas the OpenMP/TBB/PPL model is great for performing one task in parallel very simply and reliably.