1
votes

Help!

I am running my MPI code and return a runtime-error of "ONE OF THE PROCESS TERMINATED BADLY: CLEANING UP...process manager error waiting for completion", I want to figure out the number of the error process and how?

What's more, it can be ok when using 4X4(4 machine using 4 process each), but if I using 4X6 or more(4X8), there is a error.

My reduce code is below:

#include <stdio.h>
int main(void)
{
   int num,rank;
   scanf("%d %d",&num, &rank);
   int depth = 1;
   int flag = 0;
   while(num > 1) {
      if(rank < num){
          flag = num % 2;
          if(rank % 2 != 0){
              //MPI_Send(to (rank-1)*depth);
              printf("Send to %d\n", (rank - 1) * depth);
              rank *= num;
              break;
          }
          else{
              if(!(flag && (rank == (num - 1)))) {
                  //MPI_Recv(from (rank+1)*depth);
                  printf("Recv from %d\n", (rank+1)*depth);
              }
              rank /= 2;
          }
          depth *= 2;
      }
      num = num / 2 + flag;
  }
  return 0;
}

Thank you!

1
Can you post a reduced code sample? - chrisaycock
You can use a debugger for this.check this out:stackoverflow.com/questions/329259/… - chemeng
The 'reduced code sample' is, alas, not an MPI program. Can you post a reduced code sample which exhibits the aberrant behaviour you are trying to eliminate ? - High Performance Mark

1 Answers

0
votes

If the problem is related to some MPI error, e.g. you try to send messages to ranks that does not exist, you should create your own MPI error handler using MPI_Comm_create_errhandler. Here you can print the number of the rank which produces the error. Nevertheless, you must run your code in a debugger to get behind the problem.