4
votes

Recently my company got some media coverage and therefore a lot of traffic has been coming onto our website. We are an online tutoring company at http://rayku.com, that let's students get on-demand help with tutors over an interactive HTML5 whiteboard. Tutors are notified through google talk, and both users are connected to the whiteboard once the tutor clicks a link in an automated message sent through google talk.

The problem that we're having is related to varnish when connecting to the whiteboard. Unfortunately, at random times, the tutor sometimes is not able to reach the whiteboard and is given the following error message:

http://grab.by/i65A

error connecting to server: 503 Service Unavailable

Error 503 Service Unavailable

Service Unavailable

Guru Meditation:

XID: 1564976246


Varnish cache server

After clearing my cookies, this problem is resolved (but not cache). Unfortunately, this problem is difficult to replicate, and I am a suspicion that it is related to Varnish's cache overloading and not taking on the proper parameters.

Could you please help me debug this issue? Many tutors have reported this problem, and many sessions are being dropped because of it :).

Much appreciated! Donny

2
What steps have you taken to identify the root cause of the issue?Mike Brant
It is difficult to find the root cause of this issue unfortunately - it's hard to replicate and comes randomly. We believe it's a caching issue through Varnish, and that cache isn't expiring quickly enough.Donny

2 Answers

0
votes

VARNISH 3.0.7: There are various reasons for this issue. I am currently looking into this on one of my servers and am finding that I have more than one issue. In my case the "first read error" had to do with a failure with slow pages (in getting images). The way I found it was to use the varnishlog command.

https://www.varnish-cache.org/docs/3.0/tutorial/troubleshooting.html

  varnishlog -d -c -m TxStatus:503

     24 SessionOpen  c 127.0.0.1 39370 :6081
     24 ReqStart     c 127.0.0.1 39370 657793361
     24 RxRequest    c GET
     24 RxURL        c /inventory/part/MYPAGE.HTML
     24 RxProtocol   c HTTP/1.1
     24 RxHeader     c User-Agent: Mozilla/5.0 (Windows NT 6.1)       AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36
     24 RxHeader     c Referer: https://MYWEBSITE.com/inventory/new?limit=100
     24 RxHeader     c Accept: image/gif, image/x-xbitmap, image/jpeg,       image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/msword, application/xaml+xml, application/vnd.ms-xpsdocument, application/x-ms-xbap, application/x-ms-application, */*
     24 RxHeader     c Accept-Encoding: gzip, deflate
     24 RxHeader     c Accept-Language: en-US
     24 RxHeader     c Pragma: no-cache
     24 RxHeader     c Host: MYWEBSITE.com
     24 RxHeader     c Cookie: PHPSESSID=crp0881ji0qgfdqgtde10ovh72;             laravel_session=eyJpdiI6ImIrRVVGRFBWdHErdk85cU9oQThqemc9PSIsInZhbHVlIjoid0g2Zk56elVybUdlVkVQb0dCdzlVVVBhMWVmVlwvZnRPOFlEOVwvQjRWOW5ITUVyNUFCMGZyRUI5aDlGSVBoWWpsR0Z3NGxZK2NjQ2Z6Q01Lam5IWVdcL3c9PSIsIm1hYyI6ImZlZ
  ....   ....  ....

To me shows that it was failing on MYSITE.com fetching the URI above. In this case, this is expected as there is a problem with the page.

Varnish has some values to help mitigate this and other issues. The following to pay attention to are these:

  .connect_timeout = 1s; # Wait a maximum of 1s for backend connection (Apache, Nginx, etc...)
  .first_byte_timeout = 120s; # Wait a maximum of 120s for the first byte to come from your backend
  .between_bytes_timeout = 2s; # Wait a maximum of 2s between each bytes sent

If you see the "first byte error" in the log then it can be resolved with setting the first_byte_timeout (in many cases this works). Not in my case though but in the case above. What I am going to experiment with is the response timeout from the backend "connect_timeout". If you are getting unhealthy messages, then you would need to edit the .probe settings in your varnish config file. Another simple issue is that the ports are configured wrong or the profile file on /httpcheck or whatever is actually inaccessible (be sure to check that you can reach it with curl).

PS> I am responding to this thread because I noticed that it wasn't sufficiently answered, I hope my investigation helps someone at least a little.

More reading here: https://varnish-cache.org/tips/varnishlog/fetcherror.html

-2
votes

This seems to have been resolved thanks to the wonderful guys at Rackspace! It turns out that it was a cache issue, and by directing Varnish to only cache image, js, and css files, this has resolved my problem.