I have an app on Heroku that I forked to try out the Europe region support. After some initial hurdles (Heroku Europe region: "Application error" when trying to fork) this seems to be working fine.
I ran a simple load test with the ab (Apache Bench) tool and expected to see improvements in requests per second. However, not so:
Timings for 1000 requests with 10 concurrent users (ab -n 1000 -c 10 <URL>)
US:
Connection Times (ms)
min mean[+/-sd] median max
Connect: 305 349 44.4 337 719
Processing: 128 356 282.8 244 2213
Waiting: 127 350 283.0 238 2213
Total: 442 705 280.5 610 2521
EU:
Connection Times (ms)
min mean[+/-sd] median max
Connect: 125 188 47.6 175 451
Processing: 67 2595 3537.9 3309 30171
Waiting: 66 2591 3538.9 3309 30170
Total: 207 2783 3536.2 3472 30321
Some things jump out:
- latency (see "Connect:") is clearly lower for EU than US, as expected. Very good.
- some requests in EU taking 30s ? what?
After investigation, this is what I found:
- Heroku has a default request timeout of 30s: any request taking longer than that is killed. This explains the max 30s.
- But why is this happening? I looked in the logs and found that requests were initially fast, then began to take longer until eventually, they seemed to "hang" in a Unicorn worker. Those workers have a timeout of 15s, after which the worker is killed and restarted by the master Unicorn process (shows up in the Heroku logs as an H13 error). I assume that this request is not retried though, resulting in the final time of 30s.
I have looked at New Relic, but these curious slow requests just show 99% time spent in some unspecified "Application code (ROOT)" (which shows up separately from the actual DB access etc.) with no possibility to drill down (note: I'm on a free plan). There seems nothing wrong with the app code (it is running fine on US after all).
My question: how do I go about debugging this? Is there some configuration I could change to solve this? What am I missing?
UPDATE:
I tried out the suggestions in the comments, and in the end disabled the Memcachier addon entirely for EU (plus I changed the app to do no caching at all anymore, just as a test).
This did not resolve the Unicorn timeouts (although they seem to occur less now (!)).