Data partitioning in Cassandra for multiple datacenters with varying data

Question

So far, I've been through data partitioning in Cassandra and found some basic ways of doing things, like if you have 6 nodes, with 3 each in two separate data centers, we have the following method of data replication:

Data replication occurs by parsing through nodes until Cassandra comes across a node in the ring belonging to another data center and places the replica there, repeating the process until all data centers have one copy of the node - as per NetworkTopologyStrategy.

SO, we have two copies of the entire data with one in each data center. But, what if I wanted to logically split data into two separate chunks, based on some attribute like business or geographic location.(Data for India in India DataCenter). So, we would have a chunk of data in datacenters across one geographic location, another chunk in another location and none of them overlapping.

Would that be possible? And given the application of Cassandra and Big Data in general, would that make sense?

rs_atl rs_atl · Accepted Answer · 2014-09-09T16:08:47

Geographic sharding is certainly possible. You simply run multiple data centers that aren't connected, then they won't replicate. Alternatively, you can have them replicate, but your India-based app only reads and writes to your India DC. Whether it makes sense depends on your application.

Data partitioning in Cassandra for multiple datacenters with varying data

1 Answers