Is it possible to have “local” keyspace in a cassandra cluster with multiple datacenters

Question

Can I prevent a keyspace from syncing over to another datacenter by NOT including the other datacenter in my keyspace replication definition? Apparently, this is not the case.

In my own test, I have set up two Kubernetes clusters in GCP, each serves as a Cassandra datacenter. Each k8s clusters have 3 nodes.

I set up datacenter DC-WEST first, and create a keyspace demo using this: CREATE KEYSPACE demo WITH replication = {‘class’: ‘NetworkTopologyStrategy’, ‘DC-WEST’ : 3};

Then I set up datacenter DC-EAST, without adding any use keyspaces.

To join the two data centers, I modify the CASSANDRA_SEEDS environment variable in the Cassandra StatefulSet YAML to include seeds nodes from both datacenters (I use host networking).

But after that, I notice the keyspace demo is synced over to DC-EAST, even though the keyspace only has DC-WEST in the replication.

cqlsh> select data_center from system.local
... ;

data_center
-------------
DC-EAST     <-- Note: this is from the DC-EAST datacenter

(1 rows)
cqlsh> desc keyspace demo

CREATE KEYSPACE demo WITH replication = {'class': 'NetworkTopologyStrategy', 'DC-WEST': '3'}  AND durable_writes = true;

So we see in DC-EAST the demo keyspace which should be replicated only on DC-WEST! What am I doing wrong?

Jeff Jirsa Jeff Jirsa · Accepted Answer · 2019-04-10T04:53:23

Cassandra replication strategies control where data is placed, but the actual schema (the existence of the table/datacenters/etc) is global.

If you create a keyspace that only lives in one DC, all other DCs will still see the keyspace in their schema, and will even make the directory structure on disk, though no data will be replicated to those hosts.

Is it possible to have “local” keyspace in a cassandra cluster with multiple datacenters

2 Answers