0
votes

I have an app on Heroku that I forked to try out the Europe region support. After some initial hurdles (Heroku Europe region: "Application error" when trying to fork) this seems to be working fine.

I ran a simple load test with the ab (Apache Bench) tool and expected to see improvements in requests per second. However, not so:

Timings for 1000 requests with 10 concurrent users (ab -n 1000 -c 10 <URL>)

US:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      305  349  44.4    337     719
Processing:   128  356 282.8    244    2213
Waiting:      127  350 283.0    238    2213
Total:        442  705 280.5    610    2521

EU:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      125  188  47.6    175     451
Processing:    67 2595 3537.9   3309   30171
Waiting:       66 2591 3538.9   3309   30170
Total:        207 2783 3536.2   3472   30321

Some things jump out:

  • latency (see "Connect:") is clearly lower for EU than US, as expected. Very good.
  • some requests in EU taking 30s ? what?

After investigation, this is what I found:

  • Heroku has a default request timeout of 30s: any request taking longer than that is killed. This explains the max 30s.
  • But why is this happening? I looked in the logs and found that requests were initially fast, then began to take longer until eventually, they seemed to "hang" in a Unicorn worker. Those workers have a timeout of 15s, after which the worker is killed and restarted by the master Unicorn process (shows up in the Heroku logs as an H13 error). I assume that this request is not retried though, resulting in the final time of 30s.

I have looked at New Relic, but these curious slow requests just show 99% time spent in some unspecified "Application code (ROOT)" (which shows up separately from the actual DB access etc.) with no possibility to drill down (note: I'm on a free plan). There seems nothing wrong with the app code (it is running fine on US after all).

My question: how do I go about debugging this? Is there some configuration I could change to solve this? What am I missing?

UPDATE:

I tried out the suggestions in the comments, and in the end disabled the Memcachier addon entirely for EU (plus I changed the app to do no caching at all anymore, just as a test).

This did not resolve the Unicorn timeouts (although they seem to occur less now (!)).

1
Are you using any add-ons?Jan Wrobel
Yep, Memcachier and New Relic mainly. I enabled Blitz as well but have not used it so far. Disabling New Relic didn't change anything. My current suspicion is a Memcachier problem in EU, but not tested yet.Tom De Leu
Have you provisioned Memcachier for the EU instance or just copied Memcachier configuration variables from your US instance? In the second case, the application would be using memcache in the US, which would be bad.Jan Wrobel
I didn't provision or config anything separately, just forked for EU, and the docs explain this "re-provisions all add-ons" (blog.heroku.com/archives/2013/4/24/europe-region). Since everything initially worked, I didn't look further into it.Tom De Leu

1 Answers

0
votes

I suspect one of your add-ons was provisioned in a different region. Retrieve the memcachier URL with config:get:

$ heroku config:get MEMCACHIER_URL

And then paste the domain into something like http://ip-lookup.net/ to see where, geographically, the add-on is hosted.

If it's in the US, then you need to reprovision in the EU:

$ heroku addons:remove memcachier
$ heroku addons:add memcachier

And it should provision in the right region.

If an add-on was provisioned in the wrong region, then it's a bug. If so, comment here with your app name and we can look into it.