I m trying to use robust mutexes on linux to guard resources between processes and it seems that in some situations they do not behave in the "robust" way. By "robust" way i mean that pthread_mutex_lock should return EOWNERDEAD if the process owning the lock has terminated.
Here is the scenario where it doesn't work:
2 processes p1 and p2. p1 creates robust mutex and waits on it (after user's input). p2 has 2 threads: thread 1 maps into the mutex and acquires it. thread 2 (after thread 1 has acquired the mutex) also maps into the same mutex and waits on it (since thread 1 owns it now). Also note that p1 starts waiting on the mutex after p2-thread1 has already acquire it.
Now if we terminate p2, p1 never unblocks (meaning it's pthread_mutex_lock never returns) contrary to the supposed "robustness" where p1 should unblock with EOWNERDEAD error.
Here is the code:
p1.cpp:
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
struct MyMtx {
pthread_mutex_t m;
};
int main(int argc, char **argv)
{
int r;
pthread_mutexattr_t ma;
pthread_mutexattr_init(&ma);
pthread_mutexattr_setpshared(&ma, PTHREAD_PROCESS_SHARED);
pthread_mutexattr_setrobust_np(&ma, PTHREAD_MUTEX_ROBUST_NP);
int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666);
ftruncate(fd, sizeof(MyMtx));
MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx),
PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
//close (fd);
pthread_mutex_init(&m->m, &ma);
puts("Press Enter to lock mutex");
fgetc(stdin);
puts("locking...");
r = pthread_mutex_lock(&m->m);
printf("pthread_mutex_lock returned %d\n", r);
puts("Press Enter to unlock");
fgetc(stdin);
r = pthread_mutex_unlock(&m->m);
printf("pthread_mutex_unlock returned %d\n", r);
puts("Before pthread_mutex_destroy");
r = pthread_mutex_destroy(&m->m);
printf("After pthread_mutex_destroy, r=%d\n", r);
munmap(m, sizeof(MyMtx));
shm_unlink("/test_mtx_p");
return 0;
}
p2.cpp:
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
struct MyMtx {
pthread_mutex_t m;
};
static void *threadFunc(void *arg)
{
int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666);
ftruncate(fd, sizeof(MyMtx));
MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx),
PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
sleep(2); //to let the first thread lock the mutex
puts("Locking from another thread");
int r = 0;
r = pthread_mutex_lock(&m->m);
printf("locked from another thread r=%d\n", r);
}
int main(int argc, char **argv)
{
int r;
int fd = shm_open("/test_mtx_p", O_RDWR|O_CREAT, 0666);
ftruncate(fd, sizeof(MyMtx));
MyMtx *m = (MyMtx *)mmap(NULL, sizeof(MyMtx),
PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0);
//close (fd);
pthread_t tid;
pthread_create(&tid, NULL, threadFunc, NULL);
puts("locking");
r = pthread_mutex_lock(&m->m);
printf("pthread_mutex_lock returned %d\n", r);
puts("Press Enter to terminate");
fgetc(stdin);
kill(getpid(), 9);
return 0;
}
First, run p1, then run p2 and wait until it prints "Locking from another thread". Press Enter on p1's shell to lock the mutex, then press Enter on p2's shell to terminate p2, or you can just kill it some other way. You will see that p1 prints "locking..." and pthread_mutex_lock never returns.
The problem actually doesn't happen all the time, looks like it depends on timing. If you let some time elapse after p1 starts locking and before terminating p2, sometime it works and p2's pthread_mutex_lock returns 130 (EOWNERDEAD). But if you terminate p2 right after or short time after p1 starts waiting on the mutex, p1 will never unblock.
Has anybody else ever encountered the same issue?