5
votes

Keep getting a Error R14 (Memory quota exceeded) on Heroku.

Profiling the memory on the django app locally I don't see any issues. We've installed New Relic, and things seem to be fine there, except for one oddity:

http://screencast.com/t/Uv1W3bjd

Memory use hovers around 15mb per dyno, but for some reason the 'dynos running' thing quickly scales up to 10+. Not sure how that makes any sense since we are currently only running on web dyno.

We are also running celery, and things seem to look normal there as well (around 15mb). Although it is suspect because I believe we started having the error around when this launched.

Some of our requests do take awhile, as they do a soap request to echosign, which can take 6-10 seconds to respond sometimes. Is this somehow blocking and causing new dyno's to spin up?

Here is my proc file:

web: python manage.py collectstatic --noinput; python manage.py compress; newrelic-admin run-program python manage.py run_gunicorn -b "0.0.0.0:$PORT" -w 9 -k gevent --max-requests 250
celeryd: newrelic-admin run-program python manage.py celeryd -E -B --loglevel=INFO

The main problem is the memory error though.

1

1 Answers

14
votes

I BELIEVE I may have found the issue.

Based on posts like these I thought that I should have somewhere in the area of 9-10 gunicorn workers. I believe this is incorrect (or at least, it is for the work my app is doing).

I had been running 9 gunicorn workers, and finally realized that was the only real difference between heroku and local (as far as configuration).

According to the gunicorn design document the advice for workers goes something like this:

DO NOT scale the number of workers to the number of clients you expect to have. Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.

Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with. While not overly scientific, the formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request.

And while the information out there about Heroku Dyno CPU abilities, I've now read that each dyno is running on something around 1/4 of a Core. Not super powerful, but powerful enough I guess.

Dialing my workers down to 3 (which is even high according to their rough formula) appears to have stopped my memory issues, at least for now. When I think about it, the interesting thing about the memory warning I would get is it would never go up. It got to around 103% and then just stayed there, whereas if it was actually a leak, it should have kept rising until being shut down. So my theory is my workers were eventually consuming just enough memory to go above 512mb.

HEROKU SHOULD ADD THIS INFORMATION SOMEWHERE!! And at the very least I should be able to top into my running dyno to see what's going on. Would have saved me hours and days.