12
votes

I am working on a solution for centralized log file aggregation from our CentOs 6.x servers. After installing Elasticsearch/Logstash/Kibana (ELK) stack I came across an Rsyslog omelasticsearch plugin which can send messages from Rsyslog to Elasticsearch in logstash format and started asking myself why I need Logstash.

Logstash has a lot of different input plugins including the one accepting Rsyslog messages. Is there a reason why I would use Logstash for my use case where I need to gather the content of logs files from multiple servers? Also, is there a benefit of sending messages from Rsyslog to Logstash instead of sending them directly to Elasticsearch?

4

4 Answers

6
votes

I would use Logstash in the middle if there's something I need from it that rsyslog doesn't have. For example, getting GeoIP from an IP address.

If, on the other hand, I would need to get syslog or file contents indexed in Elasticsearch, I'd use rsyslog directly. It can do buffering (disk+memory), filtering, you can choose how the document will look like (you can put the textual severity instead of the number, for example), and it can parse unstructured data. But the main advantage is performance, on which rsyslog is focused on. Here's a presentation with some numbers (and tips and tricks) on Logstash, rsyslog and Elasticsearch: http://blog.sematext.com/2015/05/18/tuning-elasticsearch-indexing-pipeline-for-logs/

3
votes

I would recommend logstash. That would be easier to setup, more examples and they are tested to fit together.

Also, there are some benefits, in logstash you can filter and modify your logs.

  1. You can extend logs with useful data: server name, timestamp, ...
  2. Cast types, string to int, etc. (useful for correct Elastic index)
  3. Filter out logs by some rules

Moreover, you can setup batch size to optimize saving to elastic. Another feature, if something went wrong and there are crazy amount of logs per second that elastic can not process, you can setup logstash that it would save some queue of events or drop events that can not be saved.

2
votes

If you go straight from the server to elasticsearch, you can get the basic documents in (assuming the source is json, etc). For me, the power of logstash is to add value to the logs by applying business logic to modify and extend the logs.

Here's one example: syslog provides a priority level (0-7). I don't want to have a pie chart where the values are 0-7, so I make a new field that contains the pretty names ("emerg", "debug", etc) that can be used for display.

Just one example...

2
votes

Neither are a viable option if you really want to rely on the system to operate under load and be highly available.

We found that using rsyslog to send to a centralized location, archive it using redis of kafka and then using logstash to do its magic and ship to Elasticsearch is the best option.

Read our blog about it here - http://logz.io/blog/deploy-elk-production/

(Disclaimer - I am the VP product for logz.io and we offer ELK as a service)