I'm running 3-node Elasticsearch 2.1 cluster inside docker containers on 3 AWS hosts. Elasticsearch.yml contains following besides some other stuff:
network.host: 0.0.0.0
discovery.type: "ec2"
discovery.ec2.ping_timeout: "30s"
discovery.zen.ping.multicast.enabled: false
cloud.aws.access_key: ...
cloud.aws.secret_key: ...
cloud.aws.region: ...
In the command line I have:
-Des.network.bind_host=0.0.0.0 -Des.cluster.name=XXX -Des.node.name=XXX-1 (up to -XXX-3)
Data is stored on EBS volumes dynamically mounted on the node startup; AWS Cloud Plugin is installed.
Everything worked fine including restarts and updates until the entire system passed through a general networking problem. Not sure what happened there. After that, each node starts, claims it is running in XXX cluster but declairs itself as a master:
[cluster.service ] [XXX-3] new_master {XXX-3}{5oQHbq_KS8-JrIuFfTTBdw}{192.168.AAA.BB}{192.168.CCC.DD:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)" }
I succeeded to solve this issue by setting network.host: _ec2_
on one of the host. On startup this host successfully connected to one of the running instances and only after that the 3rd node succeeded to connect to the cluster too even with network.host: 0.0.0.0
. Now it is running just fine again.
I'm trying to understand why I had this problem - isn't it legal to use 0.0.0.0 in AWS? How it becomes working again after changing only one node network.host to _ec2_
.
Another point is I'd like to use the same service with the same command line in local environment (vagrant) but I cannot use ec2 in this case.
Thanks in advance