How to copy cassandra data from one cluster to another

Question

I Have 2 cassandra clusters, on different datacenter (note that these are 2 different clusters, NOT a single cluster with multidc), and both clusters have the same keyspace and columnfamily models. I wish to copy data of columnfamily C from Cluster A to cluster B in the most efficient way. Some other ColumnFamily I was able to copy with get and put operations, since it was a time series and the keys sequential. But this other column family C, I coulnd copy. I'm using thrift and pycassa. I've ried the CQL COPY command, but unfortunately the CF is too large and I get a rpc_timeout. How can I accomplish this?

Zanson Zanson · Accepted Answer · 2014-05-09T15:24:23

If you just want to do this as a one time thing, then take a snapshot and use the sstableloader to load that into the cluster. If you want to keep loading new data over time you will want to turn on incremental_backups, then take a snapshot to load for the initial data, and then periodically grab the sstables out of the incremental backups to sstableload to keep things up to date.

How to copy cassandra data from one cluster to another

3 Answers