1
votes

I have an Elixir/Phoenix running in production and after a while one of the beam.smp processes goes to 100% CPU load (sometimes more than one process). I'm not aware of any trigger causing this. How can I find out what's happening?

EDIT:

I ran iex on the server and connected to the Phoenix node. Than I ran etop and got this output:

Load:  cpu       100               Memory:  total       69429    binary      10568
        procs     303                        processes   16656    code        20194
        runq        1                        atom          727    ets          7205
Pid            Name or Initial Func    Time    Reds  Memory    MsgQ Current Function
----------------------------------------------------------------------------------------
<19947.645.0>  cowboy_protocol:init     '-'90164000   88736       0 'Elixir.MyApp.Error
<19947.902.0>  cowboy_protocol:init     '-'88696000   88744       0 'Elixir.MyApp.Error
<19947.242.0>  'Elixir.Redix.Connec     '-'   11697   24704       0 gen_server:loop/6
<19947.240.0>  Elixir.Exq               '-'   10284   24664       0 gen_server:loop/6
<19947.236.0>  Elixir.Exq.Redis.Cli     '-'    9597   34520       0 gen_server:loop/6
<19947.1695.0> etop_txt:init/1          '-'    6258  230504       0 etop:update/1
<19947.245.0>  Elixir.Exq.Scheduler     '-'    4831   24664       0 gen_server:loop/6
<19947.241.0>  'Elixir.Redix.Connec     '-'    2339    8856       0 gen_server:loop/6
<19947.426.0>  Elixir.MyApp.Presen      '-'     262  143160       0 gen_server:loop/6
<19947.238.0>  Elixir.Exq.Stats         '-'     105   42344       0 gen_server:loop/6
========================================================================================

Those two cowboy_protocol:initentries causing the problem. But why ... and how can I stop/prevent/debug it?

1

1 Answers

3
votes

Processes started with cowboy_protocol:init are the processes that handle HTTP requests. The high reduction count would suggests they are stuck in some kind of infinite loop - both processes seem to be executing the same function - there's extremely high chance this function is faulty.

An infinite loop in tail position doesn't consume any additional memory - only CPU. This is very much a feature - and exactly how a GenServer works - an infinite loop in tail position, so the compiler (or runtime) have no way of distinguishing between faulty and correct code that uses this pattern.

This is also very much a tribute to the praised "fault tolerance" of Erlang/Elixir - even though there exists an infinite loop in one branch of the program, the rest functions completely normally, timely responding to requests. Very few platforms are able to do that.