Apache Storm - Nimbus, Supervisors, Workers getting stopped silently

Question

I am using Apache Storm 0.9.5 version and Java 1.7 I am facing below issue.

There is a sudden death of all STORM processes happens. I ran the topology for once and observed for 1 or 2 days without sending any data. After then when I see the processes, they will be not running.

Also I have set the -XX:MaxPermSize=512m in Storm.yaml in all the nodes for nimbus, supervisor and workers.

But When I see the GC logs, it is saying

PSPermGen       total 27136K, used 26865K [0x0000000760000000, 0x0000000761a80000, 0x0000000780000000)
  object space 27136K, 99% used [0x0000000760000000,0x0000000761a3c480,0x0000000761a80000)

It is just 27MB alloted for PermGen space. Is STORM not taking 512MB of ram?

Please let me know why there is a sudden death seen for all these processes. Thank you.

In storm.yaml configuration file, we can specify as below. ======================= supervisor.childopts: "-Xmx1024m -XX:MaxPermSize=512m" worker.childopts: "-Xmx2048m -XX:MaxPermSize=512m" nimbus.childopts: "-Xmx2048m -XX:MaxPermSize=512m" ======================= — Hariprasad Taduru
Did Nimbus, Supervisors, Workers leave error logs? If it does, could you share logs? — Jungtaek Lim
I verified logs as well. But there is no error indication about this issue in the logs. Logs seems to be proper. — Hariprasad Taduru

Hariprasad Taduru Hariprasad Taduru · Accepted Answer · 2015-07-14T06:21:37

Added a monitoring process"supervisord" to monitor the master nimbus and supervisros. This way I made, required processes to be always UP and running.

Since Storm fall under fail-fast design category, a separate monitoring process is required to have 24/7 HA support for nimbus and supervisor processes.

Apache Storm - Nimbus, Supervisors, Workers getting stopped silently

1 Answers