Network packet loss causes client code to act strange

Question

I am facing some issues which I need some help on coming with a best way to resolve this.

here is the problem -

I have server code running which has a socket that is listening to accept new incoming connections.

I then attempt to start a client, which also has a socket that is listening to accept new incoming connections.

The client code begins with accepting a new connection on the listening socket file descriptor and gets a new socket file descriptor for I/O.

The server does the same thing and gets a new socket file descriptor for I/O.

Note: The client is not completely up, yet. It needs to receive some bytes from the server and send some before it can start.

I then introduce some packet loss over the TCP/IP network connection. This causes the certain errors (example: the recv() system call in the client process sees no received bytes and then closes the socket connection on the client side and the associated new socket file descriptor is closed.) However, this leaves the client process hanging since there are other descriptors in the FD_SET but none of them are I/O ready. So pselect() keeps returning 0 file descriptors ready for I/O. The client needs to send and receive certain bytes over the connection before it can start up.

My question is more of what should I do here ?

I did research on the SO_KEEPALIVE option when I create the new socket connection during the accept() system call. But I do not think that would resolve my problem here especially if the network packet loss is ongoing.

Should I kill the client process here if I realize there are no file descriptors ready for I/O and never will be ? Is there a better way to approach this ?

If both your client and your servers only accept incoming connections, who or what is initiating those connections? — Jeremy Friesner
@JeremyFriesner - The code creates parent sockets to listen to any incoming connections in both the server and client code. When anything comes in on that socket, the code uses the accept() system call to create a new socket connection. Hope this helps. — PeterJ
Usually the client would connect to the server. Does that happen here? (if not, then I would characterize both programs as 'servers') — Jeremy Friesner
Yes, you could say that in a way. Both processes are on the localhost as well. — PeterJ

Jeremy Friesner Jeremy Friesner · Accepted Answer · 2018-11-06T05:16:42

If I'm reading the question correctly, the core of the question is: "what should your client program do when a TCP connection that is central to its functionality has been broken?"

The answer to that question is really a matter of preference -- what would you like your client program to do in that case? Or to put it another way, what behavior would your users find most useful?

In many of my own client programs, I have logic included such that if the TCP connection to the server is ever broken, the client will automatically try to create a new TCP connection to the server and thereby recover its connectivity and useful functionality as soon as possible.

The other obvious option would be to just have the client quit when the connection is broken; perhaps with some sort of error indication so that the user will know why the client went away. (perhaps an error dialog that asks if the user would like to try to reconnect?)

SO_KEEPALIVE is probably not going to help you much in this scenario, by the way -- despite its name, its purpose is to help a program discover in a more timely manner that TCP connectivity has been lost, not to try harder to keep a TCP connection from being lost. (And it doesn't even serve that purpose particularly well, since in many TCP stacks only one keepalive packet is sent per hour, or so, which means that even with SO_KEEPALIVE enabled it can be a very long time before your program starts receiving error messages reflecting the loss of network connectivity)

Network packet loss causes client code to act strange

1 Answers