I'm running a project using Boost MPI (1.55) over Open MPI (1.6.1) on a compute cluster.
Our cluster has nodes that have 64 CPUs and we spawn a single MPI process on each. Most of our communication is between individual processes, each having a series of irecv() requests open (for different tags) and sends are carried out blocking using send().
The problem we're getting is that after a short time of processing (usually under 10 minutes), we're getting this error that is causing the program to end:
[btl_tcp_component.c:1114:mca_btl_tcp_component_accept_handler] accept() failed: Too many open files in system (23).
Closer debugging shows that it's network sockets that are taking up these file handles, and we're hitting our OS limit of 65536 file handles open. Most of these are in the status "TIME_WAIT", which is apparently what TCP does for (usually) 60 seconds after a socket is closed (in order to catch any late packets) . I was under the impression that Open MPI didn't close sockets(http://www.open-mpi.org/faq/?category=tcp#tcp-socket-closing) and just kept up to N^2 sockets open so that all processes could talk to each other. Obviously 65536 is way beyond 64^2 (the most common cause of this error involving MPI is simply that the file limit is less than N^2) and most of those were sockets that were in a recently closed status.
Our C++ code is too large to fit here, but I've written a simplified version of some of it to at least show our implementation and see if there are any issues with our technique. Is there something in our usage of MPI that would be causing OpenMPI to close and reopen too many sockets?
namespace mpi = boost::mpi;
mpi::communicator world;
bool poll(ourDataType data, mpi::request & dataReq, ourDataType2 work, mpi::request workReq) {
if(dataReq.test()) {
processData(data); // do a bunch of work
dataReq = world.irecv(mpi::any_source, DATATAG, data);
return true;
}
if(workReq.test()) {
int target = assess(work);
world.send(target, DATATAG, dowork);
world.irecv(mpi::any_source, WORKTAG, data);
return true;
}
return false;
}
bool receiveFinish(mpi::request finishReq) {
if (finishReq.test()) {
world.send(0, RESULTS, results);
resetSelf();
finishReq = world.irecv(0, FINISH);
return true;
}
return false;
}
void run() {
ourDataType data;
mpi::request dataReq = world.irecv(mpi::any_source, DATATAG, data);
ourDataType2 work;
mpi::request workReq = world.irecv(mpi::any_source, WORKTAG, work);
mpi::request finishReq = world.irecv(0, FINISH); // the root process can call a halt
while(!receiveFinish(finishReq)) {
bool doWeContinue = poll(data, dataReq);
if(doWeContinue) {
continue;
}
// otherwise we do other work
results = otherwork();
world.send(0, RESULTS, results);
}
}