0
votes

I have recently been looking into setting up an elasticsearch cluster, and was curious how to coordinate actions amongst multiple logstash instances. I need to regularly index data from a database, which I can do using the jdbc input plugin. My problem is that I am unsure of how to coordinate this action across multiple logstash instances. If, for example, I run the action on only one instance, and that instance goes down, the data will not be ingested into elasticsearch. On the other hand, if I run the action on multiple instances, I will be protected from the failure of any one instance, but I will have duplicate data in elasticsearch.

I believe using a load balancer along with logstash-forwarder or filebeat would avoid this problem, because the data would get sent evenly to all instances. Unfortunately, I can only query the database from my logstash instance, I can't set up logstash-forwarder or filebeat on the database server itself.

1

1 Answers

1
votes

I have an ftp{} input filter that I wrote that suffers the same problem you describe. Right now, it's only installed on one logstash machine and would require an ansible run to move it to another logstash in case of system failure. Not ideal.

I'm starting to dislike logstash for gathering data, and am leaning towards external programs for gathering and logstash-forwarder for shipping.

In my case, I'll rewrite the ftp gatherer as a stand-alone script that puts files into a directory watched by logstash-forwarder. Of course, there's no inherent redundancy there, either, but at least I can restart my logstash instances at will.