0
votes

I have a fault tolerant application, where an X Server requests to start an Application on a remote client (by some other mechanism) and receive and display its X-window. Fault tolerance means that the server needs to detect loss of the connection to the client and then call a different back-up-client and start the application there and show the window.

My question is whether there exists a mechanism in the X11 protocol that allows to reliably detect in an X11-Server whether the connection has been broken or not.

Experiments show that when unplugging a cable connection it needs some TCP-Timeout to detect the connection loss on socket level. This is very OS-dependent. In our case it was abut 30 minutes after which the X-Server eventually closed the window.

So another assumption could be that the X11-stream constantly delivers some commands and the server could implement some logic like this: If the X11-stream does not deliver any X11 traffic for a timeout y (e.g. 3 seconds), we assume the connection is lost and actively close the window and establish the connection to the fall-back-client.

Is the assumption true? I did not see any such statement in the X11-protocol about how to detect connection loss. Is there any explicit lifesign that is regularly transmitted? Or is the assumption valid that there is constant traffic? Or could there be longer periods of inactivity where nothing is transmitted at all while the connection is perfectly up and running?

There is a NoOperation command from the client that could be used for such purpose. But do clients usually implement something like that as a lifesign?

2
Are you writing an application (that runs locally, on the X11 server), that launches the remote client? Can you put something in there?ctrl-alt-delor

2 Answers

1
votes

I have a fault tolerant application, where an X Server needs to start an Application...

I don't think that an X server can "start an application". May be that some setup allows something similar to that, but normally is not so.

...whether there exists a mechanism in the X11 protocol that allows to reliably detect in an X11-Server whether the connection has been broken or not.

No, it does not exist. The X11 protocol is based on TCP/IP, which does not provide directly this "heartbeat". I think the assumption is that, if you click or otherwise stimulate an X11 window, the TCP layer will timeout or throw another error if the client application is gone.

I did not see any statement in the X11-protocol about how to detect connection loss.

There is a NoOperation command from the client that could be used for such purpose. But do clients usually implement something like that as a lifesign?

Maybe that some application uses that NoOperation, but the purpose would be different from what you need. I mean, the X11 server is like an extension from the point of view of an application; the application can have interest to know whether the server is up and working, but it is not true the contrary. And, anyway, even if the server could detect that the application is gone, probably there is no way to tell the server to launch another application.

Probably a special proxy could be deployed; it could launch the application and monitor the connection (in both ways) and take the required steps in case the application goes away. But then again, who would monitor the proxy application?

0
votes

First of all, X Protocol relies completely on TCP to send/receive information.

You cannot safely put a timeout capable transaction in order to detect a timeout in TCP. TCP is designed to retransmit only those segments that have already been sent but no acknowledged. It is completely asynchronous, in the sense that you send a command, and you can receive many responses or events unrelated to that command, before you receive the response. There's no heartbeat mechanism on XProtocol (except that the NOOP command is sent to synchronize operations with the server, and you receive a response for it, but you cannot overuse it, as that slows down severely the X connection, just launch any client with the -synchronous option to see it, see X(7)). You can even have TCP connections alive for years without interchanging a single packet. There's some mechanism, activated by option SO_KEEPALIVE that makes tcp to employ such heartbeat on TCP for a connection that has no data to transmit, but the X11 protocol normally doesn't make use of it. You don't post any code, nor a description of how the system is configured. The standard XServer never starts a connection by itself, except when launched specifically to negotiate with an XDMCP server (and this is done on UDP protocol) to serve as an XTerminal.

From your words probably you don't know that the roles of server and client are exchanged in X Protocol (the client is the remote application that connects to the server to display its output, and the server is the application that controls your display, mouse and keyboard) There's no means for the server to create a new client, so you need to be creating this connection in other means (probably through SSH, but not described).

By the way, when you say:

Experiments show that when unplugging a cable connection it needs some TCP-Timeout to detect the connection loss on socket level. This is very OS-dependent. In our case it was abut 30 minutes after which the X-Server eventually closed the window.

That is not OS-dependent. It is precisely the standard behaviour when you don't have traffic to send, there's no packet exchanged, so no detection is made (except if your client ---remember, this is the remote application program that wants to show its data in your local server--- activates the SO_KEEPALIVE option, and it requires several losses before declaring a lost connection) In your case the amount of time is variable because timers don't start until there's some data sent over the unplugged connection, and this makes it variable (not OS dependant)

On other side, you cannot pretend the server is going to turn on your monitor in case you leave the office and turn it off by mistake or by accident. What is the fault tolerance specification in that case?

IMHO, in regard of the presentation protocol, the application should be ready to show you as much information about the system as soon as you activate the connection (but the connection must be something allowed to fail). What is important is the means you develop for the application to be fault tolerant, even in the case you are not there to see the display. Will be somebody be advised that no one is looking at the screen? Are you going to detect the absence of operators in that case? Don't take this as a flame, but common sense should imperate in this case.

In case you need to ensure the connectivity to the remote host is available, you need to use another means to check for it. I recommend you to have a simple application pinging the remote host and alerting in case you don't get a positive result. Or you can open a connection to the server and then close it as soon as you get a positive response from the server (the first packet, for example) This will lead us to the next step, that is to ensure that some human is looking at the (turned on) screen of the display :)

For example, you can run a client in parallel to the one you are interested in, and force a heartbeat by asking for some server atom name (or a root window property value) in a loop with some delay. This will make the connection fail or your client can alert in case it doesn't receive the answer in some configurable time.