0
votes

My objective is to create a elasticsearch cluster in AWS using EC2 discovery.

I have 3 instances each running elasticsearch. I have provided each instance a IAM role which allows them to describe ec2 data. Each instance is inside the security group "sec-group-elasticsearch"

The nodes start but do not find each other (logs below).

I can telnet from one node to another using private dns and port 9300.

Reference
eg. telnet from node A->B works and B->A works.

telnet ip-xxx-xxx-xx-xxx.vpc.fakedomain.com 9300

iam role for each instance

{
    "Statement": [
        {
            "Action": [
                "ec2:DescribeInstances"
            ],
            "Effect": "Allow",
            "Resource": [
                "*"
            ]
        }
    ],
    "Version": "2012-10-17"
}

sec group rules

Inbound
Custom TCP Rule TCP 9200 - 9400 0.0.0.0/0

Outbound
All traffic allowed

elasticsearch.yml

bootstrap.mlockall: false
cloud.aws.region: us-east
cluster.name: my-ec2-elasticsearch
discovery: ec2
discovery.ec2.groups: sec-group-elasticsearch
discovery.ec2.host_type: private_dns
discovery.ec2.ping_timeout: 30s
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
http.port: 9200
network.host: _ec2:privateDns_
node.data: false
node.master: true
transport.tcp.port: 9300

On startup each instance logs like so:

[2016-03-02 03:13:48,128][INFO ][node                     ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] version[2.1.0], pid[26976], build[72cd1f1/2015-11-18T22:40:03Z]
[2016-03-02 03:13:48,129][INFO ][node                     ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] initializing ...
[2016-03-02 03:13:48,592][INFO ][plugins                  ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] loaded [cloud-aws], sites [head]
[2016-03-02 03:13:48,620][INFO ][env                      ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [11.4gb], net total_space [14.6gb], spins? [no], types [ext4]
[2016-03-02 03:13:50,928][INFO ][node                     ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] initialized
[2016-03-02 03:13:50,928][INFO ][node                     ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] starting ...
[2016-03-02 03:13:51,065][INFO ][transport                ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com/xxx-xxx-xx-xxx:9300}, bound_addresses {xxx-xxx-xx-xxx:9300}
[2016-03-02 03:13:51,074][INFO ][discovery                ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] my-ec2-elasticsearch/xVOkfK4TT-GWaPln59wGxw
[2016-03-02 03:14:21,075][WARN ][discovery                ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] waited for 30s and no initial state was set by the discovery
[2016-03-02 03:14:21,084][INFO ][http                     ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com/xxx-xxx-xx-xxx:9200}, bound_addresses {xxx-xxx-xx-xxx:9200}
[2016-03-02 03:14:21,085][INFO ][node                     ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] started

TRACE LOGGING ON FOR DISCOVERY:

2016-03-02 04:25:27,753][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] failed to connect to {#zen_unicast_2#}{::1}{[::1]:9300}
ConnectTransportException[[][[::1]:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /0:0:0:0:0:0:0:1:9300];
    at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:916)
    at ..............



[2016-03-02 04:25:29,253][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] connecting (light) to {#zen_unicast_1#}{127.0.0.1}{127.0.0.1:9300}
[2016-03-02 04:25:29,253][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] sending to {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}
[2016-03-02 04:25:29,254][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] received response from {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}: [ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[143], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[145], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[147], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[149], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[151], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[153], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[154], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}]
[2016-03-02 04:25:29,253][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] connecting (light) to {#zen_unicast_2#}{::1}{[::1]:9300}
[2016-03-02 04:25:29,254][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] failed to connect to {#zen_unicast_1#}{127.0.0.1}{127.0.0.1:9300}
ConnectTransportException[[][127.0.0.1:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /127.0.0.1:9300];
    at ...........

[2016-03-02 04:25:29,255][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] failed to connect to {#zen_unicast_2#}{::1}{[::1]:9300}
ConnectTransportException[[][[::1]:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /0:0:0:0:0:0:0:1:9300];
    at 
1
Since you've disabled multicast (good), you're missing the unicast hosts list discovery.zen.ping.unicast.hosts: ["a.b.c.d"] in your configuration. List all of your nodes in that list.Val
I was under the impression the ec2 discovery plugin would look for hosts by security group and consider them all potential cluster mates?mconlin
Can you modify your logging.yml file in order to increase the logging level for the EC2 plugin with discovery: TRACE. It should print out a few more info on the discovery process.Val
TRACE logging added abovemconlin
Interesting. Can you try to add that unicast hosts list, just to see the difference?Val

1 Answers

3
votes

You have a tiny typo in your elasticsearch.yml configuration file:

discovery: ec2

should read:

discovery.type: ec2