how to implement retry strategy for outbound message over TCP

Question

I have a system that sends out message over TCP socket using Java to external systems. If we want to do our best to make sure sent messages will be received, do we need perform retry logic in case a network failure is caught in the send method?

In my opinion there's no need to do retry as it is already handled by TCP under the hood. Is this argument valid?

Thanks!

Bo

TCP indeed handles that for you. Once you send out data and it has been given to the underlying kernel without error, it is out of your hands. If you get a send error, you have to disconnect the socket and connect a new socket before you can exchange data again. — Remy Lebeau
Thanks Remy! My question is more about whether doing retry makes sense here. Since TCP has already done retries for me, do I still need to do retry in my business logic? If yes, what retry strategy should be used? — bolei
You can't implement your own retries in TCP (only in UDP). Once the socket errors, that's it. The socket is in a bad/unknown state, all you can do is drop the connection and create a new one. — Remy Lebeau
The only way you can know if the receiving app has got the messages you are sending is by ack:ing every message. Ie, the receiving application has to send a message back that it has received it. This is usually cumbersome which leads to the introduction of sequence numbers on messages and the receiver being tasked with keeping track of what messages it has received and asking for re-sends if any are missed. If you do not have that level of control of the receiver, then you should catch errors and re-send but that's all you can do, as Vovanrock2002 explains below. — Erik

u354356007 u354356007 · Accepted Answer · 2017-08-04T07:48:31

It isn't that simple.

It is important to understand that TCP is a transport protocol. It's purpose is to set up reliable data streams between machines, not between applications. Successful send does not mean the message has been delivered, it means that your kernel has queued the message and will send it soon. When a host receives a message and acknowledges it, it doesn't mean your application has called or will call recv and actually get the data.

So you can certainly rely on socket errors to detect a broken connection. But when a connection breaks, your application won't be able to determine what was delivered and what wasn't based solely on information you'll be able to get from TCP.

The application level protocol you develop for your program thus needs it's own mechanism of tracking what was successfully delivered between applications. Reconnect and retry mechanism will be a part of it.

Example

Alice sends packets with some updates on her status to Bob over TCP. At some point of time Alice has three updates to send. Alice's send calls are successful.
The first update is recved by Bob. The second update is received by his machine, but the machine is suddenly reboot before Bob calls recv for the second time.
The third update is discarded by some security intermediate device in Bob's office.
Alice then wants to send one more update. The connection is in a bad state already, so send fails.

At this point of time Alice has sent three updates, but Bob has received only one. When Bob's machine is back online and new connection is established, Alice and Bob need to synchronize their state before they proceed further. They need to synchronize because Alice doesn't actually know how many updates Bob was able to process. Based only on TCP, she may only make an educated guess on how many updates Bob's machine has received.

how to implement retry strategy for outbound message over TCP

1 Answers

It isn't that simple.

Example