5
votes

This is an old problem I abandoned awhile ago because I could fine no solution and it only affected one server (so I just put my service somewhere else). This user has a problem with identical symptoms to mine:

  • C# .net synchronous Tcp server
  • a TcpClient object is assigned by blocking on a TcpListener with the AcceptTcpClient method
  • once there's a TcpClient object, I pass it to a thread that invokes the client's GetStream method to create a NetworkStream
  • this NetworkStream is looped over, in each iteration doing a networkStream.Read(someBuffer, 0, 4096)
  • right now client and server are located on the same network, with no congestion to speak of
  • my server has plenty of memory to spare
  • if I load my server software onto another machine, the problem goes away
  • the kicker: traffic from a network Linux box gets through fine and on time

The tcpClient.AcceptTcpClient() method blocks for around a minute at a time, resulting in the server having to read a huge block of bytes a short while later, instead of what it should do. It should do networkStream.Read() small blocks of bytes as frequently as they are sent (and the client sends them every 5s, not once a minute).

Previous comments to the other user suggest subpar networking or connectivity issues might be to blame, which at first seems reasonable. But this isn't actually the case.

I went one step further and installed packet analyzers at both the client and server. Results:

  • the instant the client sends one it shows up on the server's analyzer
  • network latency or connectivity are NOT the problem
  • the packet/frame are arriving at server at the correct time
  • somewhere between the network interface card that my analyzer is monitoring and my application something is causing this delay
  • the .NET runtime is the only thing between my application and network interfacing
  • some kind of socket error in .NET is the cause of this huge latency

Environment:

  • in my specific case I'm using a Intel PRO/1000 MT Network card, and .NET
  • Standard Edition Server 2003 R2, SP2
  • .NET Frameworks installed: 2.0 SP2, 3.0 SP2, 3.5 SP1, 4 Client Profile, 4 Extended

If anyone has any advice I would very much like to know what it is.

1
How are you dealing with concurrent requests ? i.e. do you spawn a thread per client, or something else ? , and at the time this occurs, it might be helpful to debug how many concurrent clients are connected at that time (to make sure you're not hitting a limit where several dead/broken clients hang around, it might only take someone doing port scanning to fire of a ton of threads in your app) - nos
@nos There's a single entry point and unique clients are assigned their own unique TcpClient object, which get's it's own thread to process the incoming contents of a NetworkStream.Read(). So yeah, 1 thread per client, with few clients using the service for the forseeable future. And in test there is only 1 client connected. - user1110648
The only thing I can think of besides an issue between the OS stack and your application is something in the packet headers sent over the wire. To be more specific, perhaps the PSH bit is not set in the TCP headers for the case where it fails. You could try running a packet sniff on a server that works and one that does. Then compare the two and see if there are any differences. - Scott Smith
It sounds like a connection issue and with your current code it will block other connections until that 1 connection is successful. You should try accepting clients asynchronously with BeginAcceptTcpClient. - Will
@High Right now, in test, there's only one client so there's zero competition for that blocking AcceptTcpClient on the listener, also my sniffer is indicating that the PSH bit is set in the header. Yeah I'm going to run my project on a server it works on with the sniffer. - user1110648

1 Answers

1
votes

This may be due to one of the following.

  • Most Likely: Hardware TCP offload.

That model of network card has been observed to have trouble with TCP offload in other situations. You can disable this at the device driver configuration.

If it is a problem with offloaded handling of segmentation then you may find it only occurs on certain network routes which may explain your observed difference between your Linux client and your Windows client.

Example: http://forums.novell.com/novell-product-support-forums/netware/nw-other/communications/187741-offload-tcp-segmentation-intel-pro-1000-mt.html

  • Less Likely: Path MTU problems.

Path MTU is supposed to be automatically discovered, but if an intervening router is dropping all ICMP traffic (including "needs fragmentation") then you may see hanging connections. In your case the connection succeeds eventually so I don't think this is your problem, but worth checking. (You can also reduce the MTU and alter the MTU discovery algorithm if necessary, but you should probably leave this alone unless this is your issue and you can't fix the router.)

  • Less Likely: Attempt to set up IPSec connection failure.

If the windows machine is in a domain it may be attempting and failing to set up an IPSec relationship. This will depend on the configuration of both the client and the server. Normally this would fail quickly, but if you are blocking some IPSec traffic, you may see it failing slowly. Look for IKE and IPSec traffic in your network analyser.