The following code sometimes blocks on read(fds[0]...)
in spawn()
when forking a specific process in spawn()
.
#include <fcntl.h>
#include <unistd.h>
#include <atomic>
#include <mutex>
#include <thread>
#include <vector>
void spawn()
{
static std::mutex m;
static std::atomic<int> displayNumber{30000};
std::string display{":" + std::to_string(displayNumber++)};
const char* const args[] = {"NullXServer", display.c_str(), nullptr};
int fds[2];
m.lock();
pipe(fds);
int oldFlags = fcntl(fds[0], F_GETFD);
fcntl(fds[0], F_SETFD, oldFlags | FD_CLOEXEC);
oldFlags = fcntl(fds[1], F_GETFD);
fcntl(fds[1], F_SETFD, oldFlags | FD_CLOEXEC);
m.unlock();
if (vfork() == 0) {
execvp("NullXServer", const_cast<char**>(args));
_exit(0);
}
close(fds[1]);
int i;
read(fds[0], &i, sizeof(int));
close(fds[0]);
}
int main()
{
std::vector<std::thread> threads;
for (int i = 0; i < 100; ++i) {
threads.emplace_back(spawn);
}
for (auto& t : threads) {
t.join();
}
return 0;
}
Note; creating the pipe here is sort of useless. It is only done to demonstrate the deadlock. The read(fds[0], ...)
in spawn()
should never block. All write-ends of the pipe have been closed once read
is called, which should result in read
returning immediately. The write-end of the pipe in the parent process is closed explicitly, and the write-end in the child process is closed implicitly due to the FD_CLOEXEC
flag set on the file descriptor, which will close the file descriptor as soon as execvp
succeeds (which it always does in this case).
The problem here is that I do see read()
blocking once in a while.
Replacing all of:
m.lock();
pipe(fds);
int oldFlags = fcntl(fds[0], F_GETFD);
fcntl(fds[0], F_SETFD, oldFlags | FD_CLOEXEC);
oldFlags = fcntl(fds[1], F_GETFD);
fcntl(fds[1], F_SETFD, oldFlags | FD_CLOEXEC);
m.unlock();
by:
pipe2(fds, O_CLOEXEC);
fixes the blocking read, even though both pieces of code should at least result in FD_CLOEXEC
being set atomically for the pipe file descriptors.
Unfortunately, I do not have pipe2
available on all platforms we deploy on.
Can anybody shed some light on why the read
would block in the above code using the pipe
approach?
Some more observations:
- Extending the mutex lock to cover the
vfork()
block solves the blocking read. - Not one system call fails.
- Using
fork()
instead ofvfork()
exhibits the same behavior. - The process that is spawned matters. In this case, a 'null' X server process is spawned on a specific display. Forking 'ls' here for example does not block, or the chances that a block occurs are significantly lower, I am not sure.
- Reproduceable on Linux 2.6.18 up to 4.12.8, so this is not some kind of Linux kernel issue I assume.
- Reproduceable using both GCC 4.8.2 and GCC 7.2.0.