Dockerized Elasticsearch cluster in AWS - each node elects itself

Question

I'm running 3-node Elasticsearch 2.1 cluster inside docker containers on 3 AWS hosts. Elasticsearch.yml contains following besides some other stuff:

network.host: 0.0.0.0
discovery.type: "ec2"
discovery.ec2.ping_timeout: "30s"
discovery.zen.ping.multicast.enabled: false
cloud.aws.access_key: ...
cloud.aws.secret_key: ...
cloud.aws.region: ...

In the command line I have:

-Des.network.bind_host=0.0.0.0 -Des.cluster.name=XXX -Des.node.name=XXX-1 (up to -XXX-3)

Data is stored on EBS volumes dynamically mounted on the node startup; AWS Cloud Plugin is installed.

Everything worked fine including restarts and updates until the entire system passed through a general networking problem. Not sure what happened there. After that, each node starts, claims it is running in XXX cluster but declairs itself as a master:

[cluster.service ] [XXX-3] new_master {XXX-3}{5oQHbq_KS8-JrIuFfTTBdw}{192.168.AAA.BB}{192.168.CCC.DD:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)" }

I succeeded to solve this issue by setting network.host: _ec2_ on one of the host. On startup this host successfully connected to one of the running instances and only after that the 3rd node succeeded to connect to the cluster too even with network.host: 0.0.0.0. Now it is running just fine again.

I'm trying to understand why I had this problem - isn't it legal to use 0.0.0.0 in AWS? How it becomes working again after changing only one node network.host to _ec2_. Another point is I'd like to use the same service with the same command line in local environment (vagrant) but I cannot use ec2 in this case.

Thanks in advance

user3155208 user3155208 · Accepted Answer · 2016-04-27T12:52:21

Eventually this is the solution I found.

Since AWS is the main ("production") environment, the main configuration is oriented on it:

Set in elasticsearch.yml - network.host: _ec2_
Set as a command line argument - elasticsearch -Des.network.bind_host=0.0.0.0

In this case discovery goes smoothly and the node bound to any network interface (cannot start without it as _ec2_ IP is related to EC2 host network and not to docker's one)

To make it running also in local environment (vagrant) the command line should override network.host parameter and so to be set to: elasticsearch -Des.network.bind_host=0.0.0.0 -Des.network.host=0.0.0.0

Dockerized Elasticsearch cluster in AWS - each node elects itself

1 Answers