How to speed up this problem by MPI

Question

(1). I am wondering how I can speed up the time-consuming computation in the loop of my code below using MPI?

 int main(int argc, char ** argv)   
 {   
 // some operations           
 f(size);           
 // some operations         
 return 0;   
 }   

 void f(int size)   
 {   
 // some operations          
 int i;           
 double * array =  new double [size];           
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }           
 // some operations using all elements in array           
 delete [] array;  
 }

As shown in the code, I want to do some operations before and after the part to be paralleled with MPI, but I don't know how to specify where the parallel part begins and ends.

(2) My current code is using OpenMP to speed up the comutation.

 void f(int size)   
 {   
 // some operations           
 int i;           
 double * array =  new double [size];   
 omp_set_num_threads(_nb_threads);  
 #pragma omp parallel shared(array) private(i)  
 {
 #pragma omp for schedule(dynamic) nowait          
 for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?   
 {   
 array[i] = complicated_computation(); // time comsuming computation   
 }          
 } 
 // some operations using all elements in array           
 }

I wonder if I change to use MPI, is it possible to have the code written both for OpenMP and MPI? If it is possible, how to write the code and how to compile and run the code?

(3) Our cluster has three versions of MPI: mvapich-1.0.1, mvapich2-1.0.3, openmpi-1.2.6. Are their usage same? Especially in my case. Which one is best for me to use?

Thanks and regards!

UPDATE:

I like to explain a bit more about my question about how to specify the start and end of the parallel part. In the following toy code, I want to limit the parallel part within function f():

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int main(int argc, char **argv)  
{  
printf("%s\n", "Start running!");  
f();  
printf("%s\n", "End running!");  
return 0;  
}  


void f()  
{  
char idstr[32]; char buff[128];  
int numprocs; int myid; int i;  
MPI_Status stat;  

printf("Entering function f().\n");

MPI_Init(NULL, NULL);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{  
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d ", myid);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  
MPI_Finalize();  

printf("Leaving function f().\n");  
}

However, the running output is not expected. The printf parts before and after the parallel part have been executed by every process, not just the main process:

$ mpirun -np 3 ex2  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
WE have 3 processors  
Hello 1 Processor 1 reporting for duty  

Hello 2 Processor 2 reporting for duty  

Leaving function f().  
End running!  
Leaving function f().  
End running!  
Leaving function f().  
End running!

So it seems to me the parallel part is not limited between MPI_Init() and MPI_Finalize().

Besides this one, I am still hoping someone could answer my other questions. Thanks!

I don't see any actual difference between this and your previous question: stackoverflow.com/questions/2152422/from-openmp-to-mpi/…. You can split your array like I showed you in my answer. The parallel part begins with MPI_Init and ends with MPI_Finilize, so you can do any serial computations before and/or after these calls. — 3lectrologos
Thank you, 3lectrologos! I just added some updates to my questions to show that it seems not true that the parallel part begins with MPI_Init and ends with MPI_Finilize. — Tim

J Teller J Teller · Accepted Answer · 2010-02-18T18:03:11

Quick edit (because I either can't figure out how to leave comments, or I'm not allowed to leave comments yet) -- 3lectrologos is incorrect about the parallel part of MPI programs. You cannot do serial work before MPI_Init and after MPI_Finalize and expect it to actually be serial -- it will still be executed by all MPI threads.

I think part of the issue is that the "parallel part" of an MPI program is the entire program. MPI will start executing the same program (your main function) on each node you specify at approximately the same time. The MPI_Init call just sets certain things up for the program so it can use the MPI calls correctly.

The correct "template" (in pseudo-code) for what I think you want to do would be:

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);  
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);

    if (myid == 0) { // Do the serial part on a single MPI thread
        printf("Performing serial computation on cpu %d\n", myid);
        PreParallelWork();
    }

    ParallelWork();  // Every MPI thread will run the parallel work

    if (myid == 0) { // Do the final serial part on a single MPI thread
        printf("Performing the final serial computation on cpu %d\n", myid);
        PostParallelWork();
    }

    MPI_Finalize();  
    return 0;  
}

How to speed up this problem by MPI

4 Answers