I am generating Cassandra SSTables using the bulk loading sample provided at DataStax website. http://www.datastax.com/dev/blog/bulk-loading
My question is how much disk space is ideally consumed by the SSTable files ? In my case my data CSV file is 40 GB and the total disk space consumed by SStables for this specific file is around 250GB. Is there something that I am missing while creating these tables ? Are there any compression options available for generating sstables ?
The second step where I am loading the sstables using sstableloader works perfectly fine and data is available for querying in CQL.
Also, I would like to know if there are anyother techniques available to import large data into cassandra other than the bulkload method that I have mentioned above.