Why is elasticsearch crashing from inactive shards and logstash failing on bulk actions?

Question

I am currently testing an ELK stack on 1 Ubuntu 14.04 box. It has 6 GB of RAM and 1TB storage. This is pretty modest, but for the amount of data I am getting, this should be plenty right? I followed this guide elk stack guide. In summary, I have Kibana4, Logstash 1.5, and Elasticsearch 1.4.4 all running on one box, with a nginx server acting as a reverse proxy so I can access Kibana from outside. The main difference from the guide is that instead of syslogs, I am taking json input from a logstash-forwarder, sending about 300 events/minute.

Once started, everything is fine -- the logs show up on Kibana and there are no errors. After about 3 hours, elasticsearch crashes. I get a

Discover: Cannot read property 'indexOf' of undefined

error on the site. Logs can be seen on pastebin. It seems that shards become inactive and elasticsearch updates the index_buffer size.

If I refresh the Kibana UI, it starts working again for my json logs. However, if I test a different log source (using TCP input instead of lumberjack), I get similar errors to the above, except that I stop processing logs -- anywhere from 10 min to an hour, I do not process any more logs and I cannot stop logstash unless I perform a kill -KILL.

Killing logstash (pid 13333) with SIGTERM
Waiting logstash (pid 13333) to die...
Waiting logstash (pid 13333) to die...
Waiting logstash (pid 13333) to die...
Waiting logstash (pid 13333) to die...
Waiting logstash (pid 13333) to die...
logstash stop failed; still running.

Logstash error log shows . Logstash .log file is empty...

For the tcp input, I get about 1500 events every 15 minutes, in a bulk insert process by logstash.

Any ideas here?

EDIT: I also observed that when starting my elasticsearch process, my shards are set to a lower mb...

[2015-05-08 19:19:44,302][DEBUG][index.engine.internal    ] [Eon] [logstash-    2015.05.05][0] updating index_buffer_size from [64mb] to [4mb]

Which version of Logstash 1.5 are you running? RC4? RC3? Beta? The index buffer is probably a non-issue resulting from Elasticsearch deprioritizing the inactive index (form its perspective) — pickypg
Are there other errors, beyond just the index buffer stuff? Like actual exceptions in the Elasticsearch logs? — pickypg
I'm running 1.5 Beta I believe... I agree that the index buffer is most likely a non issue -- the logstash however seems to be the main issue. I do not see exceptions in the Elasticsearch logs. I do not see any outputs in logstash.log, though the logstash.err log is on the pastebin I attached. — jeffrey
Can you try upgrading to the latest 1.5 RC4 and retry? The beta is a few months old and a lot of issues have been fixed since then. — pickypg

Daniil Svetlov Daniil Svetlov · Accepted Answer · 2015-05-30T17:48:39

@jeffrey, I have the same problem with DNS filter.

I did two things. I installed dnsmasq as DNS caching resolver. It help if you have high latency or high load DNS server.

And second I increased number of worker threads of logstash. Just use -w option.

Trick with threads working without dnsmasq. Trick with dnsmask without threads not.

Why is elasticsearch crashing from inactive shards and logstash failing on bulk actions?

2 Answers