9
votes

I've read all of the articles I can find on Heroku about Puma and dyno types and I can't get a straight answer.

I see some mentions that the number of Puma workers should be determined by the number of cores. I can't find anywhere that Heroku reveals how many cores a performance-M or performance-L dyno has.

In this article, Heroku hinted at an approach: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server

I think they're suggesting to set the threads to 1 and increase the number of Puma workers until you start to see R14 (memory) errors, then back off. And then increase the number of threads until the CPU maxes out, although I don't think Heroku reports on CPU utilization.

Can anyone provide guidance?

(I also want to decide whether I should use one performance-L or multiple performance-M dynos, but I think that will be clear once I figure out how to set the workers & threads)

2
Did you figure out eventually? I have exact the same issue now. Have run heroku run "cat /proc/cpuinfo" --app yourappname (cpu info), heroku run "cat /proc/meminfo" --app yourappname (memory info), heroku run "top" --app yourappname (old linux top), but cannot get into actually working dynos.Ming Hsieh

2 Answers

12
votes

The roadmap I currently figured out like this:

  1. heroku run "cat /proc/cpuinfo" --size performance-m --app yourapp
  2. heroku run "cat /proc/cpuinfo" --size performance-l --app yourapp
  3. Write down the process information you have
  4. Googling model type, family, model, step number of Intel processor, and looking for how many core does this processor has or simulates.
  5. Take a look this https://devcenter.heroku.com/articles/dynos#process-thread-limits
  6. Do some small experiments with standard-2X / standard-1X to determine PUMA_WORKER value.
  7. Do your math like this:

(Max Threads of your desired dyno type could support) / (Max Threads of baseline dyno could support) x (Your experiment `PUMA_WORKER` value on baseline dyno) - (Number of CPU core)

For example, if the PUMA_WORKER is 3 on my standard-2X dyno as baseline, then the PUMA_WORKER number on performance-m I would start to test it out would be:

16384 / 512 * 3 - 4 = 92

You should also consider how much memory your app consumes and pick the lowest one.

EDIT: Previously my answer was written before ps:exec available. You could read the official document and learn how to ssh into running dyno(s). It should be quite easier than before.

0
votes

Currently facing the same issue for an application running in production in AWS (we are using ECS), and trying to define the good fit between:

  • Quantity of vCPU / Ram per instance
  • Number of instances
  • Number of puma_threads running per instance (each instance is having a single puma process)

In order to better understand how our application is consuming the pool of puma_threads we did the following:

  • Export puma metrics to cloudwatch (threads running + backlog), we then saw that around 15 concurent threads, the backlog is starting to grow.
  • Put this in comparaison with vCPU (usage), we saw that our vCPU was never above 25%

Using these two informations together we decided to take the actions described above.

Finally I would like to share this article, that I found very interesting about this topic.