In MPI multithread enviroment MPI calls should be protected with mutex (or other thread-lock mechanism) when we initialize MPI_Init_thread with MPI_THREAD_SERIALIZED (check this answer). This is not required with MPI_THREAD_MULTIPLE, but this is not supported by all MPI implementations.
My question is whether a lock is absolutely required for some MPI functions, specifically for MPI_Test, MPI_Wait and MPI_Get_count. I know the lock is required for all MPI calls "with communication" (such as MPI_Gather, MPI_Bcast, MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv, etc), but I suspect that this lock is not required for other functions, such as MPI_Get_count, that is a local function. I need to know if this lock is required or not for functions like MPI_Test, MPI_Wait, MPI_Get_count, MPI_Probe and MPI_Iprobe (I do not know which of these are local functions and which ones are not). Is this lock-depedency defined in the MPI Standard or is it implementation-defined?
I am developing a parallelization library with non-blocking MPI calls mixed with C++11 threads, and I need use MPI_THREAD_SERIALIZED to support the most MPI implementations. MPI_THREAD_MULTIPLE is also implemented (better performance in most cases) in the library, but the MPI_THREAD_SERIALIZED support is also required.
In the next simple example code, is required the lock before the MPI_Test call?
#include <mutex>
#include <vector>
#include <thread>
#include <iostream>
#include <mpi.h>
static std::mutex mutex;
const static int numThreads = 4;
static int rank;
static int nprocs;
static void rthread(const int thrId) {
int recv_buff[2];
int send_buff[2];
MPI_Request recv_request;
{
std::lock_guard<std::mutex> lck(mutex); // <-- this lock is required
MPI_Irecv(recv_buff, 2, MPI_INT, ((rank>0) ? rank-1 : nprocs-1), thrId, MPI_COMM_WORLD, &recv_request);
}
send_buff[0] = thrId;
send_buff[1] = rank;
{
std::lock_guard<std::mutex> lck(mutex); // <-- this lock is required
MPI_Send(send_buff, 2, MPI_BYTE, ((rank+1<nprocs) ? rank+1 : 0), thrId, MPI_COMM_WORLD);
}
int flag = 0;
while (!flag) {
std::lock_guard<std::mutex> lck(mutex); // <-- is this lock required?
MPI_Test(&recv_request, &flag, MPI_STATUS_IGNORE);
//... do other stuff
}
std::cout << "[Rank " << rank << "][Thread " << thrId << "] Received a msg from thread " << recv_buff[0] << " from rank " << recv_buff[1] << std::endl;
}
int main(int argc, char **argv) {
int provided;
MPI_Init_thread(&(argc), &(argv), MPI_THREAD_SERIALIZED, &provided);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
std::vector<std::thread> threads;
for(int threadId = 0; threadId < numThreads; threadId++) {
threads.push_back(std::thread(rthread, threadId));
}
for(int threadId = 0; threadId < numThreads; threadId++) {
threads[threadId].join();
}
MPI_Finalize();
}
In my tests I executed some code without locks in MPI_Test and MPI_Get_count calls, nothing bad happened and the performance improves, but I don't know if this is ok or not.