boost::mpi throws MPI_ERR_TRUNCATE on multiple isend/irecv transfers with same tag

Question

I'm seeing an MPI_ERR_TRUNCATE error with boost::mpi when performing multiple isend/irecv transfers with the same tag using serialized data. These are not concurrent transfers, i.e. no threading is involved. There is just more than one transfer outstanding at the same time. Here's a short test program that exhibits the failure:

#include <iostream>
#include <string>
#include <vector>
#include <boost/mpi.hpp>
#include <boost/serialization/string.hpp>

static const size_t N = 2;

int main() {
   boost::mpi::environment env;
   boost::mpi::communicator world;

#if 1
   // Serialized types fail.
   typedef std::string DataType;
#define SEND_VALUE "how now brown cow"
#else
   // Native MPI types succeed.
   typedef int DataType;
#define SEND_VALUE 42
#endif

   DataType out(SEND_VALUE);
   std::vector<DataType> in(N);
   std::vector<boost::mpi::request> sends;
   std::vector<boost::mpi::request> recvs;
   sends.reserve(N);
   recvs.reserve(N);

   std::cout << "Multiple transfers with different tags\n";
   sends.clear();
   recvs.clear();
   for (size_t i = 0; i < N; ++i) {
      sends.push_back(world.isend(0, i, out));
      recvs.push_back(world.irecv(0, i, in[i]));
   }
   boost::mpi::wait_all(sends.begin(), sends.end());
   boost::mpi::wait_all(recvs.begin(), recvs.end());

   std::cout << "Multiple transfers with same tags\n";
   sends.clear();
   recvs.clear();
   for (size_t i = 0; i < N; ++i) {
      sends.push_back(world.isend(0, 0, out));
      recvs.push_back(world.irecv(0, 0, in[i]));
   }
   boost::mpi::wait_all(sends.begin(), sends.end());
   boost::mpi::wait_all(recvs.begin(), recvs.end());

   return 0;
}

In this program I first do 2 transfers on different tags, which works fine. Then I attempt 2 transfers on the same tag, which fails with:

libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl >: MPI_Unpack: MPI_ERR_TRUNCATE: message truncated

If I use a native MPI data type so that serialization is not invoked, things seem to work. I get the same error on MacPorts boost 1.55 with OpenMPI 1.7.3, and Debian boost 1.49 with OpenMPI 1.4.5. I tried multiple transfers with the same tag directly with the API C interface and that appeared to work, though of course I can only transfer native MPI data types.

My question is whether having multiple outstanding transfers on the same tag is a valid operation with boost::mpi, and if so is there a bug in my program or a bug in boost::mpi?

rhashimoto rhashimoto · Accepted Answer · 2014-02-24T17:03:56

At the current version of boost, 1.55, boost::mpi does not guarantee non-overtaking messages. This in contrast to the underlying MPI API which does:

Order Messages are non-overtaking: If a sender sends two messages in succession to the same destination, and both match the same receive, then this operation cannot receive the second message if the first one is still pending. If a receiver posts two receives in succession, and both match the same message, then the second receive operation cannot be satisfied by this message, if the first one is still pending. This requirement facilitates matching of sends to receives. It guarantees that message-passing code is deterministic, if processes are single-threaded and the wildcard MPI_ANY_SOURCE is not used in receives.

The reason boost::mpi does not guarantee non-overtaking is that serialized data types are transferred in two MPI messages, one for size and one for payload, and irecv for the second message cannot be posted until the first message is examined.

A proposal to guarantee non-overtaking in boost::mpi is being considered. Further discussion can be found on the boost::mpi mailing list beginning here.

boost::mpi throws MPI_ERR_TRUNCATE on multiple isend/irecv transfers with same tag

2 Answers