OpsCenter cannot recognize Cassandra nodes in the same network

Question

I am experimenting with Datestax OpsCenter 5.2 and Cassandra 2.1.7. One trouble I encountered is that OpsCenter daemon (i.e., the server) seems to try to connect to the Cassandra agents using the broadcast_rpc_address, which is blocked by the security group (because broadcast_rpc_address is a public IP on AWS).

Details

The cluster has three nodes (10.0.0.0/24 is the subnet of a VPC on AWS, 52.x.x.x is a public IP)

Node0

cassandra.yaml: broadcast_address=10.0.0.100, rpc_address=10.0.0.100, broadcast_rpc_address=52.2.3.100

address.yaml: stomp_interface=10.0.0.99, local_interface=10.0.0.100, agent_rpc_broadcast_address=10.0.0.100

Node1

cassandra.yaml: broadcast_address=10.0.0.101, rpc_address=10.0.0.101, broadcast_rpc_address=52.2.3.101

address.yaml: stomp_interface=10.0.0.99, local_interface=10.0.0.101, agent_rpc_broadcast_address=10.0.0.101

Node2

cassandra.yaml: broadcast_address=10.0.0.102, rpc_address=10.0.0.102, broadcast_rpc_address=52.2.3.102

address.yaml: stomp_interface=10.0.0.99, local_interface=10.0.0.102, agent_rpc_broadcast_address=10.0.0.102

OpsCenter Node

Deployed in the same subnet

ip=10.0.0.99

Symptons

After adding "10.0.0.100, 10.0.0.101, 10.0.0.102" to the "Add Cluster" window on OpsCenter web console, I got the following in opscenterd.log:

2015-09-04 11:05:38+0000 []  INFO: New Cassandra host 52.2.3.100 discovered
2015-09-04 11:05:38+0000 []  INFO: New Cassandra host 52.2.3.101 discovered
...
2015-09-04 11:05:43+0000 []  WARN: [control connection] Error connecting to 52.2.3.100: errors=Timed out creating connection, last_host=None
2015-09-04 11:05:43+0000 [] ERROR: Control connection failed to connect, shutting down Cluster: ('Unable to connect to any servers', {'52.2.3.100': OperationTimedOut('errors=Timed out creating connection, last_host=None',)})

Notice OpsCenter tries to connect to nodes via their broadcast_rpc_address, which is blocked by the security group. This is despite I have set agent_rpc_broadcast_address to subnet IPs.

Question 1

Is this the correct behavior of OpsCenter? Why agent_rpc_broadcast_address is not used?

Question 2

If I change broadcast_rpc_address to subnet IPs, then OpsCenter connects fine. But this prevents my clients from connecting, because non-seed nodes will have their subnet IP reported by seed nodes to the client, which is not reachable by the client.

I can also open up the security group to the OpsCenter server, but this is risky and requires going through the gateway.

So how should I solve the problem in this case?

Thoughts

The core of this problem is how to "intelligently" decide which IP to connect to depending on whether a client is inside or outside a subnet. All documentation I have seen does not make it clear how this works.

Thanks for any help.

Addition 1

Would be grateful if you could also clarify how rpc(thrift) and native(binary) protocol are used by client and OpsCenter.

I have the impression that rpc is deprecated in favor of native protocol, but will this affect inter-node and client-node connection?

@LHWizard, commenting broadcast_rpc_address will cause clients trying to connect to the subnet IP (for any non-seed node). For a client not in the subnet, this will fail. — stackoverflower
My intention is to get both a) an OpsCenter in the same subnet and b) a client outside the subnet, to work. — stackoverflower
also your statement in Question 2 is wrong. The seeds don't report ip addresses to the clients. — LHWizard

arre arre · Accepted Answer · 2015-09-04T22:17:32

Question 1

Is this the correct behavior of OpsCenter? Why agent_rpc_broadcast_address is not used?

It is at the moment. OpsCenter (via underlying python driver) gets rpc address from Cassandra itself, so it gets values of broadcast_rpc_address, which are not reachable. agent_rpc_broadcast_address is used to connect to the agents, not Cassandra nodes themselves.

I’m not sure why are you blocking access to the broadcast address from the same subnet (same security group even?) while also allowing access to it from outside the subnet.