0
votes

After doing calculations to multiply a matrix with a vector using Cartesian topology. I got the following process with the their ranks and vectors.

P0 (process with rank = 0) =[2 , 9].
P1 (process with rank = 1) =[2 , 3]
P2 (process with rank = 2) =[1 , 9] 
P3 (process with rank = 3) =[4 , 6].

Now. I need to sum the elements of the even rank processes and the odd ones separately, like this:

temp1 = [3 , 18]
temp2 = [6 , 9]

and then , gather the results in a different vector, like this:

result = [3 , 18 , 6 , 9]

My attampt to do it is to use the MPI_Reduce and then MPI_Gather like this :

// Previous code 
 double* temp1 , *temp2;
    if(myrank %2 == 0){
     BOOLEAN flag =  Allocate_vector(&temp1 ,local_m); // function to allocate space for vectors
     MPI_Reduce(local_y, temp1, local_n, MPI_DOUBLE, MPI_SUM, 0 ,  comm);
     MPI_Gather(temp1, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE,0, comm);
      free(temp1);
         }
  else{
      Allocate_vector(&temp2 ,local_m);
      MPI_Reduce(local_y, temp2, local_n , MPI_DOUBLE, MPI_SUM, 0 ,  comm);
      MPI_Gather(temp2, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE, 0,comm);
      free(temp2);
         }

But the answer is not correct.It seemd that the code sums all elements of the even and odd process togather and then gives a segmentation fault error: Wrong_result = [21 15 0 0] and this error

** Error in ./test': double free or corruption (fasttop): 0x00000000013c7510 *** *** Error in./test': double free or corruption (fasttop): 0x0000000001605b60 ***

1

1 Answers

1
votes

It won't work the way you are trying to do it. To perform reduction over the elements of a subset of processes, you have to create a subcommunicator for them. In your case, the odd and the even processes share the same comm, therefore the operations are not over the two separate groups of processes but rather over the combined group.

You should use MPI_Comm_split to perform a split, perform the reduction using the two new subcommunicators, and finally have rank 0 in each subcommunicator (let's call those leaders) participate in the gather over another subcommunicator that contains those two only:

// Make sure rank is set accordingly

MPI_Comm_rank(comm, &rank);

// Split even and odd ranks in separate subcommunicators

MPI_Comm subcomm;
MPI_Comm_split(comm, rank % 2, 0, &subcomm);

// Perform the reduction in each separate group

double *temp;
Allocate_vector(&temp, local_n);
MPI_Reduce(local_y, temp, local_n , MPI_DOUBLE, MPI_SUM, 0, subcomm);

// Find out our rank in subcomm

int subrank;
MPI_Comm_rank(subcomm, &subrank);

// At this point, we no longer need subcomm. Free it and reuse the variable.

MPI_Comm_free(&subcomm);

// Separate both group leaders (rank 0) into their own subcommunicator

MPI_Comm_split(comm, subrank == 0 ? 0 : MPI_UNDEFINED, 0, &subcomm);
if (subcomm != MPI_COMM_NULL) {
  MPI_Gather(temp, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE, 0, subcomm);
  MPI_Comm_free(&subcomm);
}

// Free resources

free(temp);

The result will be in gResult of rank 0 in the latter subcomm, which happens to be rank 0 in comm because of the way the splits are performed.

Not as simple as expected, I guess, but that is the price of having convenient collective operations in MPI.


On a side node, in the code shown you are allocating temp1 and temp2 to be of length local_m, while in all collective calls the length is specified as local_n. If it happens that local_n > local_m, then heap corruption will occur.