7
votes

I'm hoping the community can clarify something for me, and that others can benefit.

My understanding is that gunicorn worker processes are essentially virtual replicas of Heroku web dynos. In other words, Gunicorn's worker processes should not be confused with Heroku's worker processes (e.g. Django Celery Tasks).

This is because Gunicorn worker processes are focused on handling web requests (basically throttling up the performance of the Heroku Web Dyno) while Heroku Worker Dynos specialize in Remote API calls, etc that are long-running background tasks.

I have a simple Django app that makes decent use of Remote APIs and I want to optimize the resource balance. I am also querying a PostgreSQL database on most requests.

I know that this is very much an oversimplification, but am I thinking about things correctly?

Some relevant info:

https://devcenter.heroku.com/articles/process-model

https://devcenter.heroku.com/articles/background-jobs-queueing

https://devcenter.heroku.com/articles/django#running-a-worker

http://gunicorn.org/configure.html#workers

http://v3.mike.tig.as/blog/2012/02/13/deploying-django-on-heroku/

https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/gunicorn/

Other Quasi-Related Helpful SO Questions for those researching this topic:

Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack

Performance degradation for Django with Gunicorn deployed into Heroku

Configuring gunicorn for Django on Heroku

Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack

1
A dyno is more like a host and a gunicorn worker is a process running on that host. There is not a one to one correspondence as you could have more than one gunicorn worker process running on a single dyno. If looking to tune your application and stack, you might consider looking at New Relic. Since you get access to New Relic standard subscription level for free on Heroku, no harm trying it at least.Graham Dumpleton
Thanks Graham, I am using New Relic and it's quite useful. My understanding is that a heroku dyno is a single thread, one process host, but that gunicorn is a process that can spawn workers to handle multiple web requests concurrently. That being said, I'm still looking for someone to confirm that a gunicorn worker is fundamentally different to a heroku worker dyno.BFar
Way back in time dynos for Ruby were a single threaded process. Not so now. You can actually run up multiple processes in a dyno using foreman and in case of gunicorn, you could tell it to run three worker processes for handling requests. Technically I could (and am working on it), run up Apache/mod_wsgi in a dyno and have multiple processes which are all multithreaded handling requests.Graham Dumpleton
Under New Relic, the dynos tab is actually misleading as it is going to tell you how any web processes and not dynos. It used to work when one to one, but not with new Heroku dynos and don't have a simple solution for fixing disparity right now.Graham Dumpleton
You're absolutely right that New Relic throws a new 'dyno' definition into the mix. However, while you can run ~3 gunicorn workers per web dyno, these still do not run background jobs like celery workers on Heroku worker dynos do, right?BFar

1 Answers

16
votes

To provide an answer and prevent people from having to search through the comments, a dyno is like an entire computer. Using the Procfile, you give each of your dynos one command to run, and it cranks away on that command, re-running it periodically to refresh it and re-running it when it crashes. As you can imagine, it's rather wasteful to waste an entire computer running a single-threaded webserver, and that's where Gunicorn comes in.

The Gunicorn master thread does nothing but act as a proxy server, spawning a given number of copies of your application (workers), distributing HTTP requests amongst them. It takes advantage of the fact that each dyno actually has multiple cores. As someone mentioned, the number of workers you should choose depends on how much memory your app takes to run.

Contrary to what Bob Spryn said in the last comment, there are other ways of exploiting this opportunity for parallelism to run separate servers on the same dyno. The easiest way is to make a separate sub-procfile and run the all-Python Foreman equivalent, Honcho, from your main Procfile, following these directions. Essentially, in this case your single dyno command is a program that manages multiple single commands. It's kind of like being granted one wish from a genie, and making that wish be for 4 more wishes.

The advantage of this is you get to take full advantage of your dynos' capacity. The disadvantage of this approach is that you lose the ability scale individual parts of your app independently when they're sharing a dyno. When you scale the dyno, it will scale everything you've multiplexed onto it, which may not be desired. You will probably have to use diagnostics to decide when a service should be put on its own dedicated dyno.