In my current project we (I mean "project team") use WCF services hosted on IIS.
Here are some technical details which may be important:
- We use NET 3.5 for WCF services
- We use NET.TCP communication protocol
- We use both IIS 7 and IIS 7.5 to host these services
- We use multiple IIS worker processes on each server
So, the problem is - sometimes WCF-services become unavailable. When we try to reach these WCF-services we get timeout error. And the only way to restore WCF-service functioning is to restart NetTcpActivator (Net.Tcp Listener Adapter) Windows service.
According to my colleague's theory, this error may be related to the problems described in this KB article:
FIX: Smsvchost.exe for the WCF service stops responding when you run a .NET Framework 4-based WCF service http://support.microsoft.com/kb/2536618
According to this article, SMSvcHost (container service which hosts NetTcpActivator and Port Sharing Service) hangs up if it can't route a request to w3wp (IIS worker process) in over 60 seconds (non-configurable timeout). Unfortunately, we are unable to find the way to reproduce this error. For example, we limited SMSvcHost to 1 CPU core and 1 thread and extended pending connections limit to 1M and pushing it to 100% CPU load in user mode. And it didn't hang!
Sometimes our load tests lead to strange errors, but when we stop them, all services automatically recover to their normal state. But sometimes not a heavy load may hang NetTcpActivator!
In addition, I would like to say that this is not a new problem. My colleagues already got it 3 years ago (see this thread for additional information http://forums.iis.net/t/1167668.aspx/1/10). And, unfortunately, they didn't get the answer. The problem just disappeared after some configuration changes! And now it came back on the new server.
I will really appreciate all you thoughts and ideas!