2
votes

I Have 2 cassandra clusters, on different datacenter (note that these are 2 different clusters, NOT a single cluster with multidc), and both clusters have the same keyspace and columnfamily models. I wish to copy data of columnfamily C from Cluster A to cluster B in the most efficient way. Some other ColumnFamily I was able to copy with get and put operations, since it was a time series and the keys sequential. But this other column family C, I coulnd copy. I'm using thrift and pycassa. I've ried the CQL COPY command, but unfortunately the CF is too large and I get a rpc_timeout. How can I accomplish this?

3

3 Answers

1
votes

If you just want to do this as a one time thing, then take a snapshot and use the sstableloader to load that into the cluster. If you want to keep loading new data over time you will want to turn on incremental_backups, then take a snapshot to load for the initial data, and then periodically grab the sstables out of the incremental backups to sstableload to keep things up to date.

0
votes

I don't have much knowledge on How to copy cassandra data from one cluster to another but For rpc_timeout error you can use

cqlsh --request-timeout 3600 <IP address>

use the above command to enter into Cql shell request-timeout by default in sec,you can increase if you want

0
votes

Time to time I also need to copy data from one cassandra cluster to another. I use this tool https://github.com/masumsoft/cassandra-exporter. export.js script exports data to a json files, import.js script imports exported data to a cassandra. You can do it for all tables in specified keyspace or for a particular table only. Target keyspace and tables should exist before import.

In js script you can adjust batch size and readTimeout if you get "read timeout error".

UPDATE: After a hint by Alex Ott I tried DSBulk tool. It works great but only for one table per-run. If you want to process full keyspace you need a script that runs DSBulk for each table.