Fail to establish vert.x cluster between verticles on cloud environment

Question

I'm having troubles to configure vert.x eventbus clustered in a private cloud environment.

In a laboratory test, I'm trying make two verticles to establish a cluster using a Hazelcast cluster manager, each one runing at your own container.

Probally the problem occurs by a misconfiguration, but I'm uncapable of find it. On this cloud it is not possible do multicast call, then I'm using a TCP IP discovery strategy.

The initial plan is to make a "labatf-api" verticle (a REST call receiver) that will propagated the bussines processing, through eventbus, to be executed in "labatf-vtx" verticle.

Below, the code to configure the cluster fragment of "labatf-api" verticle:

Config hazelcastConfig = new Config();
NetworkConfig networkConfig = new NetworkConfig();

networkConfig
    .setPort(5701)
    .getJoin()
        .getMulticastConfig()
            .setEnabled(false);
networkConfig
    .getJoin()
        .getAwsConfig()
            .setEnabled(false);
networkConfig
    .getJoin()
        .getTcpIpConfig()
            .setEnabled(true)           
            .addMember("labatf-vtx:5701");

hazelcastConfig.setNetworkConfig(networkConfig);

ClusterManager mgr = new HazelcastClusterManager(hazelcastConfig);
VertxOptions options = new VertxOptions()
    .setClusterManager(mgr)
    .setEventBusOptions(new EventBusOptions()                           
            .setClusterPublicHost("labatf-api")
            .setClusterPublicPort(5701))
    .setClustered(true);

Vertx.clusteredVertx(options, res -> {
    if (res.succeeded()) {
        ...
    }
});

And "labatf-api" verticle code:

Config hazelcastConfig = new Config();
NetworkConfig networkConfig = new NetworkConfig();

networkConfig
    .setPort(5701)
    .getJoin()
        .getMulticastConfig()
            .setEnabled(false);
networkConfig
    .getJoin()
        .getAwsConfig()
            .setEnabled(false);
networkConfig
    .getJoin()
        .getTcpIpConfig()
            .setEnabled(true)           
            .addMember("labatf-api:5701");

hazelcastConfig.setNetworkConfig(networkConfig);

ClusterManager mgr = new HazelcastClusterManager(hazelcastConfig);
VertxOptions options = new VertxOptions()
    .setClusterManager(mgr)
    .setEventBusOptions(new EventBusOptions()                           
            .setClusterPublicHost("labatf-vtx")
            .setClusterPublicPort(5701))
    .setClustered(true);

Vertx.clusteredVertx(options, res -> {
    if (res.succeeded()) {
        ...
    }
});

Note that "labatf-api" and "labatf-vtx" are module names in the cloud environment, but they are also domain names to service IP, that will balance calls between container replicas of each one modules, if they exist.

After start verticles containers, each module discovery the other, but few seconds later the connection is interrupted by the destination peer, as the log below:

In "labatf-api" verticle:

INFO:   [192.168.84.205]:5701 [dev] [3.9] Accepting socket connection from /192.168.80.253:52191
INFO:   [192.168.84.205]:5701 [dev] [3.9] Established socket connection between /192.168.84.205:5701 and /192.168.80.253:52191
WARNING:[192.168.84.205]:5701 [dev] [3.9] Wrong bind request from [192.168.80.253]:5701! This node is not the requested endpoint: [labatf-api]:5701
INFO: [192.168.84.205]:5701 [dev] [3.9] Connection[id=2, /192.168.84.205:5701->/192.168.80.253:52191, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [192.168.80.253]:5701! This node is not the requested endpoint: [labatf-api]:5701
INFO: [192.168.84.205]:5701 [dev] [3.9] Connection[id=5, /192.168.84.205:45323->labatf-vtx/10.36.232.241:5701, endpoint=[labatf-vtx]:5701, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side

In "labatf-vtx" verticle:

INFO: [192.168.80.253]:5701 [dev] [3.9] Accepting socket connection from /192.168.84.205:60711
INFO: [192.168.80.253]:5701 [dev] [3.9] Established socket connection between /192.168.80.253:5701 and /192.168.84.205:60711
WARNING: [192.168.80.253]:5701 [dev] [3.9] Wrong bind request from [192.168.84.205]:5701! This node is not the requested endpoint: [labatf-vtx]:5701
INFO: [192.168.80.253]:5701 [dev] [3.9] Connection[id=3, /192.168.80.253:5701->/192.168.84.205:60711, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [192.168.84.205]:5701! This node is not the requested endpoint: [labatf-vtx]:5701
INFO: [192.168.80.253]:5701 [dev] [3.9] Connection[id=4, /192.168.80.253:55987->labatf-api/10.36.212.47:5701, endpoint=[labatf-api]:5701, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side

Any help will be wellcome!

noctarius noctarius · Accepted Answer · 2017-11-28T16:25:08

You cannot connect Hazelcast nodes through load balancer. Hazelcast nodes have to talk to each other directly. We do not use HTTP(S) but a custom, TCP/IP based protocol.

Fail to establish vert.x cluster between verticles on cloud environment

1 Answers