Failing over to a remote datacenter - Cassandra

Question

We are facing a problem we did not expect in Cassandra we have a cluster of 6 nodes split into two Datacenters. (See the image1 bellow) http://s9.postimg.org/vyiykbosf/Cassandra_normal.png Unfortunatly we faced a problem recently, we lost 3 nodes (see images2 bellow) and we where not able to have the cluster fully available. http://postimg.org/image/yy3o6w10r/

On each datacenter we have a read consistency of ONE and a WRITE consistency of LOCAL_QUORUM. The thing is that we lost two nodes on the same datacenter and when the coordinator was set to the only available node in this Datacenter the consistency LOCAL_QUORUM wasn't satisfied when there was a write.

We know there is the onWriteTimeout method but we do not want to lower the consistency level. Therefore, is it possible to switch the coordinator when the LOCAL_QUORUM is not possible ?(ie : When coordinator is on DataCenter II, the write is not possible then a retry switch the coordinator to an available node on Datacenter I)

We found the Class DCAwareRoundRobinPolicy, but I'm not sure how it really works and If it will fit to our need. Do you guys know how the host on the remote datacenter is choose ? Where is set the list of those hosts ?

Regards,

Stephen Walsh Stephen Walsh · Accepted Answer · 2015-11-20T13:05:26

Sorry, my first reply was deleted as it only asked if you ever found and answer.

However I eventually did find an answer.

So if you have 2 DC's with 3 nodes on each with replication factor 3. And want to achive local_quourm should one DC fail or 1 node on a DC fail. Then you need to connect to the cluster using this

http://grepcode.com/file/repo1.maven.org/maven2/com.datastax.cassandra/cassandra-driver-core/2.0.7/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.java#172

Line 172,

Set "localDc"to your DC name, E.G "DC1"
Set "usedHostsPerRemoteDc" to the number of of nodes to query in DC2, E.G 3
Set "allowRemoteDCsForLocalConsistencyLevel" to true.

In our testing our applications switch over to the remote DC when we killed the local one.

But note: this comes with a consistency warning ... as this would potentially break consistency guarantees and if you are fine with that, it's probably better to use a weaker consitency like ONE, TWO or THREE

Failing over to a remote datacenter - Cassandra

3 Answers