0
votes

After developing a sample client server application which can exchange some data, I'm trying to implement the retry mechanism into it. Currently my application is following below protocol:

  1. Client connects to server (non blocking mode) with 3 secs timeout and with 2 reties.
  2. Start sending data from client with fixed length. Send has some error checking whether it is sending the complete data or not.
  3. Receive response (timeout: 3secs) from server and verify that. If incorrect response received, re-send the data and wait for response. Repeat this for two times if failed.

For the above implementation code sections look likes something below:

  1. connect() and select() for opening connection
  2. select() and send() for data send
  3. select() and recv() for data receiving

Now I'm making the retries based on return types of the socket functions, and if send() or recv() fails I'm retring the same methods. But not recalling connect().

I tested the thing by restarting the server in between the data transfer, and as a result client fails to communicate with the server and it quits after several retries, I believe this is happening as because there is no connect() call on retry methods.

Any suggestions?

Example code for receiving socket data

bool CTCPCommunication::ReceiveSocketData(char* pchBuff, int iBuffLen)
{
  bool bReturn = true;

  //check whether the socket is ready to receive
  fd_set stRead;
  FD_ZERO(&stRead);
  FD_SET(m_hSocket, &stRead);
  int iRet = select(0, &stRead, NULL, NULL, &m_stTimeout);

  //if socket is not ready this line will be hit after 3 sec timeout and go to the end
  //if it is ready control will go inside the read loop and reads data until data ends or
  //socket error is getting triggered continuously for more than 3 secs.
  if ((iRet > 0) && (FD_ISSET(m_hSocket, &stRead)))
  {
    DWORD dwStartTime = GetTickCount();
    DWORD dwCurrentTime = 0;

    while ((iBuffLen-1) > 0)
    {
      int iRcvLen = recv(m_hSocket, pchBuff, iBuffLen-1, 0);
      dwCurrentTime = GetTickCount();

      //receive failed due to socket error
      if (iRcvLen == SOCKET_ERROR)
      {
        if((dwCurrentTime - dwStartTime) >= SOCK_TIMEOUT_SECONDS * 1000)
        {
          WRITELOG("Call to socket API 'recv' failed after 3 secs continuous retries, error: %d", WSAGetLastError());
          bReturn = false;
          break;
        }
      }
      //connection closed by remote host
      else if (iRcvLen == 0)
      {
        WRITELOG("recv() returned zero - time to do something: %d", WSAGetLastError());
        break;
      }

      pchBuff  += iRcvLen;
      iBuffLen -= iRcvLen;
    }
  }
  else
  {
    WRITELOG("Call to API 'select' failed inside 'ReceiveSocketData', error: %d", WSAGetLastError());
    bReturn = false;
  }

  return bReturn;
}
1
You need to check for fatal errors and if you get any, you need to close the connection and create a new one. You can't keep sending on a dead connection. (You can't select on a dead connection either -- there's nothing to wait for.)David Schwartz
@DavidSchwartz: Could you tell me a bit more about fatal error. Which SOCKET_ERROR code are you talking about? I'm not clear, when to retry with send/recv and when to retry with connect() i.e. building the socket from start.hypheni
Pretty much the only time you should retry a send or recv is if it is interrupted by a signal or would have blocked. All other errors are fatal to the connection.David Schwartz
For a non blocking socket will it be okay to call select and checks for read/writability then fires up recv/send and checks for the SOCKET_ERROR. If found, just re connect with new socket.hypheni
No, for two reasons. First, you might get interrupted by a signal. Second, a hit on select doesn't guarantee that a subsequent operation won't fail with EWOULDBLOCK. For an obvious example, suppose you get a write hit and then try to write 64MB.David Schwartz

1 Answers

1
votes

Currently my application is following below protocol:

  1. Client connects to server (non blocking mode) with 3 secs timeout and with 2 retries.

You can't retry a connection. You have to close the socket whose connect attempt failed, create a new socket, and call connect() again.

  1. Start sending data from client with fixed length. Send has some error checking whether it is sending the complete data or not.

This isn't necessary in blocking mode: the POSIX standard guarantees that a blocking-mode send() will send all the data, or fail with an error.

  1. Receive response (timeout: 3secs) from server and verify that. If incorrect response received, re-send the data and wait for response. Repeat this for two times if failed.

This is a bad idea. Most probably all the data willl arrive including all the retries, or none of it. You need to make sure that your transactions are idempotent if you use this technique. You also need to pay close attention to the actual timeout period. 3 seconds is not adequate in general. A starting point is double the expected service time.

For the above implementation code sections look likes something below:

   connect() and select() for opening connection
   select() and send() for data send
   select() and recv() for data receiving

You don't need the select() in blocking mode. You can just set a read timeout with SO_RCVTIMEO.

Now I'm making the retries based on return types of the socket functions, and if send() or recv() fails I'm retrying the same methods. But not recalling connect().

I tested the thing by restarting the server in between the data transfer, and as a result client fails to communicate with the server and it quits after several retries, I believe this is happening as because there is no connect() call on retry methods.

If that was true you would get an error that said so.