Loading Cassandra data with SStableloader from different Cassandra cluster

Question

I have two different independent machines running Cassandra and I want to migrate the data from one machine to the other.

Thus, I first took a snapshot of my Cassandra Cluster on machine 1 according to the datastax documentation.

Then I moved the data to machine 2, where I'm trying to import it with sstableloader.

As a note: The keypsace (open_weather) and tablename (raw_weather_data) on the machine 2 have been created and are the same as on machine 1.

The command I'm using looks as follows:

bin/sstableloader -d localhost "path_to_snapshot"/open_weather/raw_weather_data

And then get the following error:

Established connection to initial hosts
Opening sstables and calculating sections to stream
For input string: "CompressionInfo.db"
java.lang.NumberFormatException: For input string: "CompressionInfo.db"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:276)
    at org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:235)
    at org.apache.cassandra.io.sstable.Component.fromFilename(Component.java:120)
    at org.apache.cassandra.io.sstable.SSTable.tryComponentFromFilename(SSTable.java:160)
    at org.apache.cassandra.io.sstable.SSTableLoader$1.accept(SSTableLoader.java:84)
    at java.io.File.list(File.java:1161)
    at org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:78)
    at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:162)
    at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)

Unfortunately I have no idea why?

I'm not sure if it is related to the issue, but somehow on machine 1 my *.db files are name rather "strange" as compared to the *.db files I already have on machine 2.

*.db files from machine 1:

la-53-big-CompressionInfo.db
la-53-big-Data.db
...
la-54-big-CompressionInfo.db
...

*.db files from machine 2:

open_weather-raw_weather_data-ka-5-CompressionInfo.db
open_weather-raw_weather_data-ka-5-Data.db

What am I missing? Any help would be highly appreciated. I'm also open to any other suggestions. The COPY command will most probably not work since it is Limited to 99999999 rows as far as I know.

P.s. I didn't want to create a overly huge post, but if you need any further information to help me out, just let me know.

EDIT: Note that I'm using Cassandra in the stand-alone mode.

EDIT2: After installing the same version 2.1.4 on my destination machine (machine 2), I still get all the same error. With SSTableLoader I still get the above mentioned error and with copying the files manually (as described by LHWizard), I still get empty tables after starting Cassandra again and performing a SELECT command.

Regarding the initial tokens, I get a huge list of tokens if I perform node ring on machine 1. I'm not sure what to do with those?

The second machine has the same keyspaces and tables as the 1st machine. Initially it also had some values in it. I first tried it with the other data inside and then with an empty table. Same result in both cases. — Kito

LHWizard LHWizard · Accepted Answer · 2016-01-28T05:02:36

your data is already in the form of a snapshot (or backup). What I have done in the past is the following:

install the same version of cassandra on the restore node
edit cassandra.yaml on the restore node - make sure that cluster_name and snitch are the same.
edit seeds: list and any other properties that were altered in the original node.
get the schema from the original node using cqlsh DESC KEYSPACE.
start cassandra on the restore node and import the schema. (steps 6 & 7 may not be completely necessary, but this is what I do.)
stop cassandra, delete the contents of /var/lib/cassandra/data/, commitlog/, and saved_caches/* folders.
restart cassandra on the restore node to recreate the correct folders, then stop it
copy the contents of the snapshots folder to each corresponding table folder in the restore node, then start cassandra. You probably want to run nodetool repair.

You don't really need to bulk import the data, it's already in the correct format if you are using the same version of cassandra, although you didn't specify that in your original question.

Loading Cassandra data with SStableloader from different Cassandra cluster

1 Answers