I'm currently trying to setup a MPI-Client connecting to a server which publishes a certain name but it doesn't work and I have no clue about it.
MPI is OpenMPI 1.6 using g++-4.7, where /usr/lib64/mpi/gcc/openmpi/etc/openmpi-default-hostfile contains 1 line:
MY_IP
The following "minimal" (I don't like questions using too much code but I think I should include it here) example illustrates the problem:
mpi_srv.cc
#include <iostream>
#include <mpi.h>
int main (void)
{
int rank(0);
MPI_Init(0, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &rank);
std::cout << "Rank: " << rank << std::endl;
char port_name[MPI_MAX_PORT_NAME];
MPI_Open_port(MPI_INFO_NULL, port_name);
char publish_name[1024] = {'t','e','s','t','_','p','o','r','t','\0'};
MPI_Publish_name(publish_name, MPI_INFO_NULL, port_name);
std::cout << "Port: " << publish_name << " (" << port_name << ")" << std::endl;
MPI_Comm client;
std::cout << "Wating for Comm..." << std::endl;
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client);
std::cout << "Comm accepted" << std::endl;
MPI_Comm_free(&client);
MPI_Unpublish_name(publish_name, MPI_INFO_NULL, port_name);
MPI_Close_port(port_name);
MPI_Finalize();
return 1;
}
compiled and executed via
mpic++ mpi_src.cc -o mpi_srv.x
mpirun mpi_srv.x
prints
Rank: 1 Port: test_port (2428436480.0;tcp://MY_IP:33573+2428436481.0;tcp://MY_IP:43172:300) Wating for Comm...
and blocks as required.
My client
mpi_client.cc
#include <iostream>
#include <mpi.h>
int main (void)
{
int rank(0);
MPI_Init(0, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &rank);
std::cout << "Rank: " << rank << std::endl;
char port_name[MPI_MAX_PORT_NAME];
char publish_name[1024] = {'t','e','s','t','_','p','o','r','t','\0'};
MPI_Lookup_name(publish_name, MPI_INFO_NULL, port_name);
MPI_Comm client;
MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client);
MPI_Comm_disconnect(&client);
MPI_Finalize();
return 1;
}
compiled and executed via
mpic++ mpi_client.cc -o mpi_client.x
mpirun mpi_client.x
prints
Rank: 1 [MY_HOST:24870] *** An error occurred in MPI_Lookup_name [MY_HOST:24870] *** on communicator MPI_COMM_WORLD [MY_HOST:24870] *** MPI_ERR_NAME: invalid name argument [MY_HOST:24870] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
with the server still running.
I removed the error checking in the exmaples above but the function return values indicate successful publication of the port name in the server executable. I found out that this problem can arise because of the published port being invisible to the client when using different mpirun but I used the same mpirun executable to execute both.
Why doesn't the client connect to the server as I'd expect here?