I'm writing a program to test interprocess communication, in particular, POSIX shared memory. I'm using POSIX semaphores to synchronize the processes' access to the shared memory. (I read that posix sem_open function lets you use the same semaphore between processes, as long as you use the same "name" identifier. )
Problem is - when I do sem_wait and sem_post one process... the other process does not catch the semaphore. Process 1 just hogs the semaphore and releases it and then grabs it back itself without ever giving the other process a chance to intervene.
Here is the code on process 1
if ((sem1 = sem_open(request->mem_group.sem_name, O_CREAT, 0644, 0)) ==
SEM_FAILED) {
perror("sem_open");
goto finish;
}
cache = simplecache_get(request->file_path);
*(int *)mem_shared = cache == -1 ? -1 : 1;
sem_post(sem);
sem_wait(sem);
if (cache == -1) {
break;
fprintf(stdout, "File was not found, going to finish\n");
}
file_length = lseek(cache, 0, SEEK_END);
lseek(cache, 0, SEEK_SET);
*(size_t *)mem_shared = file_length;
sem_post(sem);
sem_wait(sem1);
if (!file_len) {
goto finish;
}
bytes_transferred = 0;
while (bytes_transferred < file_len) {
//rest of while loop here which transfers file
And here is the block of code in Process 2 where it should be catching the semaphore but doesn't
sem_wait(sem1);
file_size = *(size_t *)mem_shared;
gfs_sendheader(ctx, GF_OK, file_size);
sem_post(sem1);
if (!file_size) {
fprintf(stderr, "File is empty. Go to finish");
break;
}
So the idea is - this process 2 should be getting the seemaphore in between post/wait in the other process- and at that point the shared mem segment has data in it and isn't empty. However instead, it catches the semaphore at the very END of the other process, when it has emptied the sahred memory segment and deleted any data inside of it.
I did a lot of trouble shooting and confirmed that a) the semaphore is the same semaphore in each process b) Process 1 does at some point increment the semaphore, and then catch the same semaphore and decrement it (checked this with sem_getvalue)
I am running this on a Ubuntu virtual machine through Oracle VM VirtualBox. Underlying laptop is a Microsoft Surfacebook.
Have been stuck on this problem for 48 hours and feel extremely discouraged. Any tips or advice on how to more strategically debug it would also be appreciated.