Efficient way to execute the sequential part(large no of operations + writing file) of a parallel code?

Question

I have a C++ code using mpi and is executed in a sequential-parallel-sequential pattern. The above pattern is repeated in a time loop. While validating the code with the serial code, I could get a reduction in time for the parallel part and in fact the reduction is almost linear with the no of processors.

The problem that I am facing is that the time required for the sequential part also increases considerably when using higher no of processors.

The parallel part takes less time to be executed in comparison with total sequential time of the entire program.
Therefore although there is a reduction in time for the parallel part when using higher no of processors, the saving in time is lost considerably due to increase in time while executing the sequential part. Also the sequential part includes a large no of computations at each time step and writing the data to an output file at some specified time.
All the processors are made to run during the execution of sequential part and the data is gathered to the root processor after the parallel computation and only the root processor is allowed to write the file.
Therefore can anyone suggest what is the efficient way to calculate the serial part (large no of operations + write the file) of the parallel code ? I would also like to clarify on any of the point if required.

Thanks in advance.

Andriy Tylychko Andriy Tylychko · Accepted Answer · 2011-07-28T15:36:24

First of all, do file writing from separate thread (or process in MPI terms), so other threads can use your cores for computations.

Then, check why your parallel version is much slower than sequential. Often this means you creates too small tasks so communication between threads (synchronization) eats your performance. Think if tasks can be combined into chunks and complete chunks processed in parallel.

And, of course, use any profiler that is good for multithreading environment.

[EDIT]

sequential part = part of your logic that cannot be (and is not) paralleled, do you mean the same? sequential part on multicore can work a bit slower, probably because of OS dispatcher or something like this. It's weird that you see noticable difference.

Disk is sequential by its nature, so writing to disk from many threads don't give any benefits, but can lead to the situation when many threads try to do this simultaneously and waits for each other instead of doing something useful.

BTW, what MPI implementation do you use?

Your problem description is too high-level, provide some pseudo-code or something like this, this can help us to help you.

Efficient way to execute the sequential part(large no of operations + writing file) of a parallel code?

1 Answers