5
votes

RabbitMQ crashed. RabbitMQ was working correctly for many days(10-15 days). I am not getting why it got crashed.


I am using RabbitMQ 3.4.0 on Erlang 17.0


The erlang has created dump file for the crash. Which shows

eheap_alloc: Cannot allocate 229520 bytes of memory (of type "old_heap").

Also note that the rabbitmq publish-subscribe message load is very low. (max:1-2 messages/second).And RabbitMQ messages are processed as it comes so RabbitMQ is almost empty all the time. The disk space & memory are also sufficient.

More system info:
Limiting to approx 8092 file handles (7280 sockets)
Memory limit set to 6553MB of 16383MB total.
Disk free limit set to 50MB.

The RabbitMQ logs are as below.

=ERROR REPORT==== 18-Jul-2015::04:29:31 ===
** Generic server rabbit_disk_monitor terminating 
** Last message in was update
** When Server state == {state,"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit@localhost-mnesia",
                               50000000,28358258688,100,10000,
                               #Ref<0.0.106.70488>,false}
** Reason for termination == 
** {eacces,[{erlang,open_port,
                    [{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit@localhost-mnesia\""},
                     [stream,in,eof,hide]],
                    []},
            {os,cmd,1,[{file,"os.erl"},{line,204}]},
            {rabbit_disk_monitor,get_disk_free,2,[]},
            {rabbit_disk_monitor,internal_update,1,[]},
            {rabbit_disk_monitor,handle_info,2,[]},
            {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,599}]},
            {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}

=INFO REPORT==== 18-Jul-2015::04:29:31 ===
Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{eacces,[{erlang,open_port,
                          [{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit@localhost-mnesia\""},
                           [stream,in,eof,hide]],
                          []},
                  {os,cmd,1,[{file,"os.erl"},{line,204}]},
                  {rabbit_disk_monitor,get_disk_free,2,[]},
                  {rabbit_disk_monitor,init,1,[]},
                  {gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
                  {proc_lib,init_p_do_apply,3,
                            [{file,"proc_lib.erl"},{line,239}]}]}},
 17179336704}

=INFO REPORT==== 18-Jul-2015::04:29:31 ===
Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{eacces,[{erlang,open_port,
                          [{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit@localhost-mnesia\""},
                           [stream,in,eof,hide]],
                          []},
                  {os,cmd,1,[{file,"os.erl"},{line,204}]},
                  {rabbit_disk_monitor,get_disk_free,2,[]},
                  {rabbit_disk_monitor,init,1,[]},
                  {gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
                  {proc_lib,init_p_do_apply,3,
                            [{file,"proc_lib.erl"},{line,239}]}]}},
 17179336704}

=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.167.0>
    registered_name: rabbit_disk_monitor
    exception exit: {eacces,
                        [{erlang,open_port,
                             [{spawn,
                                  "C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit@localhost-mnesia\""},
                              [stream,in,eof,hide]],
                             []},
                         {os,cmd,1,[{file,"os.erl"},{line,204}]},
                         {rabbit_disk_monitor,get_disk_free,2,[]},
                         {rabbit_disk_monitor,internal_update,1,[]},
                         {rabbit_disk_monitor,handle_info,2,[]},
                         {gen_server,handle_msg,5,
                             [{file,"gen_server.erl"},{line,599}]},
                         {proc_lib,init_p_do_apply,3,
                             [{file,"proc_lib.erl"},{line,239}]}]}
      in function  gen_server:terminate/6 (gen_server.erl, line 746)
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
    messages: []
    links: [<0.166.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 4185
    stack_size: 27
    reductions: 481081978
  neighbours:

=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
     Supervisor: {local,rabbit_disk_monitor_sup}
     Context:    child_terminated
     Reason:     {eacces,
                     [{erlang,open_port,
                          [{spawn,
                               "C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit@localhost-mnesia\""},
                           [stream,in,eof,hide]],
                          []},
                      {os,cmd,1,[{file,"os.erl"},{line,204}]},
                      {rabbit_disk_monitor,get_disk_free,2,[]},
                      {rabbit_disk_monitor,internal_update,1,[]},
                      {rabbit_disk_monitor,handle_info,2,[]},
                      {gen_server,handle_msg,5,
                          [{file,"gen_server.erl"},{line,599}]},
                      {proc_lib,init_p_do_apply,3,
                          [{file,"proc_lib.erl"},{line,239}]}]}
     Offender:   [{pid,<0.167.0>},
                  {name,rabbit_disk_monitor},
                  {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
                  {restart_type,{transient,1}},
                  {shutdown,4294967295},
                  {child_type,worker}]


=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.24989.51>
    registered_name: []
    exception exit: unsupported_platform
      in function  gen_server:init_it/6 (gen_server.erl, line 322)
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
    messages: []
    links: [<0.166.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 650
  neighbours:

=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
     Supervisor: {local,rabbit_disk_monitor_sup}
     Context:    start_error
     Reason:     unsupported_platform
     Offender:   [{pid,<0.167.0>},
                  {name,rabbit_disk_monitor},
                  {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
                  {restart_type,{transient,1}},
                  {shutdown,4294967295},
                  {child_type,worker}]


=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.24991.51>
    registered_name: []
    exception exit: unsupported_platform
      in function  gen_server:init_it/6 (gen_server.erl, line 322)
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
    messages: []
    links: [<0.166.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 650
  neighbours:

=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
     Supervisor: {local,rabbit_disk_monitor_sup}
     Context:    start_error
     Reason:     unsupported_platform
     Offender:   [{pid,{restarting,<0.167.0>}},
                  {name,rabbit_disk_monitor},
                  {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
                  {restart_type,{transient,1}},
                  {shutdown,4294967295},
                  {child_type,worker}]
3
I am also facing the same problem. Any help would be appreciatedNIrav Modi

3 Answers

2
votes

From the error message, rabbitmq can't open more files due to system limits. You can set max open file numbers to upper value to avoid the problem.

https://serverfault.com/questions/249477/windows-server-2008-r2-max-open-files-limit

1
votes

There are two unrelated errors here: one is the VM failure to allocate memory. Another is disk space monitor terminating. Disk space monitor is optional and on some less common platforms or with specific security restrictions, it is known to fail. That does not bring the VM down, and certainly has nothing to do with heap allocation failures.

The heap allocation failure typically comes down to two most common cases:

  • A known bug fixed in Erlang 17.x (don't recall which specific patch release, so use 17.5)
  • You run 32 bit Erlang/OTP on a 64 bit OS.

Chen Yu's comment about the EACCESS system call error is correct.

0
votes

I get analog error systemd unit for activation check: "rabbitmq-server.service" eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap").^M ^M Crash dump is being written to: erl_crash.dump...done^M

ulimit -a 
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 514979
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 514979
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

this is crush dump =erl_crash_dump:0.5

Wed Dec  2 17:16:31 2020
Slogan: eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap").
System version: Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:512] [kernel-poll:true]
Compiled: Mon Feb  5 17:34:00 2018
Taints: crypto,asn1rt_nif,erl_tracer,zlib
Atoms: 34136
Calling Thread: scheduler:0
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:
=scheduler:2
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING
Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process:
=scheduler:3
Scheduler Sleep Info Flags: 
Scheduler Sleep Info Aux Work: DELAYED_AW_WAKEUP | DD | THR_PRGR_LATER_OP
Current Port: 
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process: <0.12306.0>
Current Process State: Running
Current Process Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING | TRAP_EXIT | ON_HEAP_MSGQ
Current Process Program counter: 0x00007f2f3ab3a060 (unknown function)
Current Process CP: 0x0000000000000000 (invalid)
Current Process Limited Stack Trace:
0x00007f2b50252d68:SReturn addr 0x32A6EC98 (rabbit_channel:handle_method/3 + 6712)
0x00007f2b50252d78:SReturn addr 0x32A69630 (rabbit_channel:handle_cast/2 + 4160)
0x00007f2b50252df8:SReturn addr 0x51102708 (gen_server2:handle_msg/2 + 1808)
0x00007f2b50252e28:SReturn addr 0x3FD85E70 (proc_lib:init_p_do_apply/3 + 72)
0x00007f2b50252e48:SReturn addr 0x7FFB4948 (<terminate process normally>)
=scheduler:4
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING