Wait until slaves called MPI_finalize

Question

I have a problem with the following codes:

Master:

#include <iostream> 
using namespace std;

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define PB1 1
#define PB2 1

int main (int argc, char *argv[])
{
  int np[2] = { 2, 1 }, errcodes[2];
  MPI_Comm parentcomm, intercomm;
  char *cmds[2] = { "./slave", "./slave" };
  MPI_Info infos[2] = { MPI_INFO_NULL, MPI_INFO_NULL };  
  MPI_Init(NULL, NULL);

#if PB1
  for(int i = 0 ; i<2 ; i++)
    {
      MPI_Info_create(&infos[i]);      
      char hostname[] = "localhost";
      MPI_Info_set(infos[i], "host", hostname);
    }
#endif

  MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes);  
  printf("c Creation of the workers finished\n");

#if PB2
  sleep(1);
#endif

  MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes);
  printf("c Creation of the workers finished\n");

  MPI_Finalize();
  return 0;
}

Slave:

#include "mpi.h"
#include <stdio.h>

using namespace std;

int main( int argc, char *argv[])
{
  int rank;
  MPI_Init(0, NULL);

  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  printf("rank =  %d\n", rank);

  MPI_Finalize();  
  return 0;
}

I do not know why when I run mpirun -np 1 ./master, my program stops with the following mesage when I set PB1 and PB2 to 1 (it works well when I set of of them to 0):

There are not enough slots available in the system to satisfy the 2 slots that were requested by the application: ./slave Either request fewer slots for your application, or make more slots available for use.

For instance, when I set PB2 to 0, the program works well. Thus, I suppose that it is because the MPI_finalize does not finish its job ...

I googled, but I did not find any answer for my problem. I tried various things as: call MPI_comm_disconnect, add a barrier, ... but nothing worked.

I work on Ubuntu (15.10) and use the OpenMPI version 1.10.2.

I'm not sure this answers your question, but you cannot MPI_Finalize a subset of connected processes. "MPI_Finalize is collective over all connected processes. [...] it is collective over the union of all processes that have been and continue to be connected. ". Your first bunch of slaves will never finish before you call MPI_Finalize at the master. You could MPI_Comm_disconnect see here - not sure exactly what you tried. — Zulan
I very much appreciate your minimal example, but it might be beneficial in this case to know what you are ultimately trying to achieve. Your concept for spawning salves may or may not be well thought out and there may or may not be much better alternatives. This also depends on your actual use case. If this is for instance for a batch system, then you won't have any fun acquiring dynamic resources anyway. — Zulan
My objective is to implement an application that calls external softwares to solve sub-problems (Constraints Satisfaction Problems). In this application the slaves have to communicate a lot with the master. — JML
Do you have control over the external software or are those black boxes? How do you envision to communicate with the external software? — Zulan
It is not really a blackbox. However, I prefer to limit the modifications to make the software user friendly (for non-specialist). — JML

Zulan Zulan · Accepted Answer · 2016-03-15T17:00:24

The MPI_Finalize on the first set of salves will not finish until MPI_Finalize is called on the master. MPI_Finalize is collective over all connected processes. You can work around that by manually disconnecting the first batch of salves from the intercommunicator before calling MPI_Finalize. This way, the slaves will actually finish complete and exit - freeing the "slots" for the new batch of slaves. Unfortunately I don't see a standardized way to really ensure the slaves are finished in a sense that their slots are freed, because that would be implementation defined. The fact that OpenMPI freezes in the MPI_Comm_spawn_multiple instead of returning an error is unfortunate and one might consider that a bug. Anyway here is a draft of what you could do:

Within the master, each time is done with its slaves:

MPI_Barrier(&intercomm); // Make sure master and slaves are somewhat synchronized
MPI_Comm_disconnect(&intercomm);
sleep(1); // This is the ugly unreliable way to give the slaves some time to shut down

The slave:

MPI_Comm parent;
MPI_Comm_get_parent(&parent); // you should have that already
MPI_Comm_disconnect(&parent);
MPI_Finalize();

However, you still need to make sure OpenMPI knows how many slots should be reserved for the whole application (universe_size). You can do that for example with a hostfile:

localhost slots=4

And then mpirun -np 1 ./master.

Now this is not pretty and I would argue that your approach to dynamically spawning MPI workers isn't really what MPI is meant for. It may be supported by the standard, but that doesn't help you if implementations are struggling. However, there is not enough information on how you intend to communicate with the external processes to provide a cleaner, more ideomatic solution.

One last remark: Do check the return codes of MPI functions. Especially MPI_Comm_spawn_multiple.

Wait until slaves called MPI_finalize

1 Answers