I'm using Cassandra 2.0.9 for store quite big amounts of data, let's say 100Gb, in one column family. I would like to export this data to CSV in fast way. I tried:
- sstable2json - it produces quite big json files which are hard to parse - because tool puts data in one row and uses complicated schema (ex. 300Mb Data file = ~2Gb json), it takes a lot of time to dump and Cassandra likes to change source file names according its internal mechanism
- COPY - causes timeouts on quite fast EC2 instances for big number of records
- CAPTURE - like above, causes timeouts
- reads with pagination - I used timeuuid for it, but it returns about 1,5k records per second
I use Amazon Ec2 instance with fast storage, 15 Gb of RAM and 4 cores
Is there any better option for export gigabytes of data from Cassandra to CSV?