I have recently been looking into setting up an elasticsearch cluster, and was curious how to coordinate actions amongst multiple logstash instances. I need to regularly index data from a database, which I can do using the jdbc input plugin. My problem is that I am unsure of how to coordinate this action across multiple logstash instances. If, for example, I run the action on only one instance, and that instance goes down, the data will not be ingested into elasticsearch. On the other hand, if I run the action on multiple instances, I will be protected from the failure of any one instance, but I will have duplicate data in elasticsearch.
I believe using a load balancer along with logstash-forwarder or filebeat would avoid this problem, because the data would get sent evenly to all instances. Unfortunately, I can only query the database from my logstash instance, I can't set up logstash-forwarder or filebeat on the database server itself.