Is there any approach to update and restart a server keeping its socket in a "suspended" state?

Question

There's a program listening and answering requests (proprietary binary protocol) in a TCP/IP port. But this program needs to be updated and so it needs to be restarted and then it can continue doing its work on the same port.

According to its protocol, all current connections can be closed because all clients will re-establish new connections right after they are closed, but new connections should be retained (but not denied) until the program has been restarted (for a few seconds), how could it be done?

So, as soon as it is running again all retained connections on a given port could be released to reach the listening socket.

Let's imagine the following steps:

A server program is running and listening to a given port, let's say port A.
It asks to an external resource (like the operational system or any third party module) to retain all connections coming to the port A.
It closes all current connections that is currently established to the port A - IT MIGHT TAKE TIME (maybe a couple of minutes, because it will finish first all requested services)
It's restarted and a brand new executable comes to life and starts to listen to the port A.
It asks to the external resource to release all retained connections, so they can now reach the port A, that is now ready to receive connections.

The steps 2 and 4 are just assumptions.

The only way I know to do this is to delegate the listening socket to a separate process. But then you are back to the same issue if THAT process needs to be updated, too. Catch-22. Not worth trying to work around. Just let the server kill its active clients and close its listening port, then reopen the port after the update is finished. Clients that try to reconnect during the update will fail to connect. They should be coded reasonably to retry N times at X intervals until re-connected, and timeout and give up after N failed attempts. — Remy Lebeau
You're better off having several instances of your program behind a load balancer. That way you can bleed off connections from one instance and restart it when it becomes idle. — dbush
@dbush: Sorry, I didn't list that in that case I don't have multiple hosts (computers), but are there ways to switch connections like a load balancer do, but among processes (in the same system)? — Luciano
You don't necessarily need multiple hosts for a load balancer. Have the LB running on one port, and each instance of your app on different ports. — dbush
@RemyLebeau: On non-Windows systems, one can use SCM_RIGHTS ancillary payload over a local Unix domain socket, to transfer the listening socket descriptors between processes. So, the helper process only needs to live during the update, and can exit afterwards. In fact, I suggest (answer below) having the service create that child process itself. I guess Windows service writers are limited to using a load balancer, though. — Nominal Animal

Nominal Animal Nominal Animal · Accepted Answer · 2018-11-01T21:22:48

In POSIXy systems (Linux, Mac, BSDs) there is a rather simple, but clever way for the service process to achieve this. It does not even need any privileges to do so.

The core idea is very simple: When the service knows it will restart, it will create a detached child process (in a new session and process group, so it'll be reparented to init) holding the listening socket(s). Then, the parent will simply no longer accept() any new connections, finish any incomplete responses, and re-execute itself with the updated binary.

The holder process will also listen for incoming connections on an Unix domain (stream or seqpacket; connection-oriented) socket. The updated server instance will connect to the holder process, with an ancillary payload of SCM_CREDENTIALS, which includes kernel-verified user and group the process runs as, and process ID that the holder process can use to examine if the connecting party is an updated version of the binary. (In Linux, this can be done by comparing the stat()s of /proc/PID/exe and the expected executable.) If the other end is authorized, the holder transfers the listening socket descriptors back, using SCM_RIGHTS ancillary payload. Finally, the updated service sends a final thank you, that tells the holder process to exit (which also closes its copies of the listening socket descriptors).

As long as the backlog (see listen()) is sufficient (or syncookies enabled in Linux, which makes the backlog essentially unlimited), this should be quite robust approach.

If desired, I can provide example code on how this would work in Linux. (I consider the security aspects critical here, so I would definitely do Linux-only stuff, like examining /proc/PID/exe, to verify that only the updated binary can re-acquire the listening sockets.)

Is there any approach to update and restart a server keeping its socket in a "suspended" state?

2 Answers