20
votes

I have a rather high-load deployment on Azure: 4 Large instances serving about 300-600 requests per second. Under normal conditions: "Average Response Time" is 70 to 150ms, but sometimes it may grow up to 200-300ms, but it's absolutely OK.

Though, one or two times per day (not at "Rush Hours") I see such picture on the Web Site Monitoring tab:

Azure Web Site Monitoring

So, number of requests per minute significantly drops, average response time is growing on to 3 minutes, and after a while – everything comes back to normal.

During this "Blackout" there is only 0.1% requests being dropped (Http Server Errors with timeout), other requests just wait in queue and are normally processed after few minutes. Though, not all clients are ready to wait :-(

Memory usage is under 30% all the time, CPU usage is only up to 40-50%.

What I've already checked?:

  1. Traces for timed-out requests: they did timed out at random locations.
  2. Throttling for Azure Storage and other components used: no throttling at all.
  3. I also tried to route all traffic through CloudFlare: and saw the same problems.

What could be the reason for such problems? What may I check next?
Thank you all in advance!

Update 1: BenV proposed good thing to try, but unfortunately it showed nothing :-(
I configured processes recycling every 500k requests and also added worker nodes, so CPU utilization is now less than 40% all day long, but blackouts still appear.

Update 2: Project uses ASP.Net MVC 4.

2
I had a similar problem with a really small application. I tried a lot of stuff and the solution was to clear the handlers at beginning and add them manually. Maybe it helps you too.user2721793

2 Answers

8
votes

I had this exact same problem. For me I saw a lot of WinCache errors in my logs.

Whenever the site would fail, it would have a lot of WinCache errors in the log. WinCache is how IIS handles PHP to try to speed up the processing. It’s a Microsoft built add-on that is enabled by default in IIS and all Azure sites. WinCache would get hung up and instead of recycling and continuing, it would consume all the memory and file handles on an instance, essentially locking it up.

I added new App setting in the Azure Portal to scan a folder for php.ini settings changes.
d:\home\site\ini

Added a file in d:\home\site\ini\settings.ini that contains the following

wincache.fcenabled=1
session.save_handler = files
memory_limit = 256M
wincache.chkinterval=5
wincache.ucachesize=200
wincache.scachesize=64
wincache.enablecli=1
wincache.ocenabled=0 


wincache.fcenabled=1

Enables file caching using WinCache (I think that's the default anyway)

session.save_handler = files

Changes the session handler from WinCache (Azure Default) to standard file based to reduce the cache engine stress

memory_limit = 256M
wincache.chkinterval=5
wincache.ucachesize=200
wincache.scachesize=64
wincache.enablecli=1

Sets the WinCache size to 256 megabytes per thread and limits the overall Cache size. This forces WinCache to clear out old data and recycle the cache more often.

wincache.ocenabled=0 

This is the big one. DISABLE WinCache Operational Code caching. That is WinCache caching the actual PHP scripts into memory. Files are still cached from line one, but PHP is interpreted per normal and not cached into large binary files.

I went from having a my Azure Website crash about once every 3 days with logs that look like yours to 120 days straight so far without any issues.

Good luck!

5
votes

There's some nice tools available for Web Apps in the preview portal.

Azure Web Apps tools menu

The Application Insights extension especially can be useful for monitoring and troubleshooting app performance.

enter image description here