0
votes

I am generating Cassandra SSTables using the bulk loading sample provided at DataStax website. http://www.datastax.com/dev/blog/bulk-loading

My question is how much disk space is ideally consumed by the SSTable files ? In my case my data CSV file is 40 GB and the total disk space consumed by SStables for this specific file is around 250GB. Is there something that I am missing while creating these tables ? Are there any compression options available for generating sstables ?

The second step where I am loading the sstables using sstableloader works perfectly fine and data is available for querying in CQL.

Also, I would like to know if there are anyother techniques available to import large data into cassandra other than the bulkload method that I have mentioned above.

1

1 Answers

0
votes

First of all check whether compression is enabled or not. How to check that?

If the sstable is compressed it will have a CompressionInfo.db component (i.e. one of the file composing the sstable with end with --CompressionInfo.db). If there is no such file then it's not compressed.

For further compression related information, check this.

Moving to last question there is other alternative to bulkload method, use COPY command. See documentation