5
votes

We currently evaluate the use of Apache Cassandra 1.2 as a large scale data processing solution. As our application is read-intensive and to provide users with the fastest possible response time we would like to configure Apache Cassandra to keep all data in-memory.

Is it enough to set the storage option caching to rows_only on all column families and giving each Cassandra node sufficient memory to hold its data portion? Or are there other possibilities for Cassandra ?

2

2 Answers

9
votes

Read performance tuning is much complex than write. Base on my experiences, there are some factors you can take into consideration. Some point of view are not memory related, but they also help improve the read performance.

1.Row Cache: avoid disk hit, but enable it only if the rows are not updated frequently. You could also enable the off-heap row cache to reduce the JVM heap usage.

2.Key Cache: enable by default, no need to disable it. It avoid disk searching when row cache is not hit.

3.Reduce the frequency of memtable flush: adjust memtable_total_space_in_mb, commitlog_total_space_in_mb, flush_largest_memtables_at

4.Using LeveledCompactionStrategy: avoid a row spread across multiple SSTables.

1
votes

DataStax has added an in-memory computing feature in the latest version of its Apache Cassandra-based NoSQL database, as part of a drive to increase the performance of online applications.

Reference :

http://www.datastax.com/2014/02/welcome-to-datastax-enterprise-4-0-and-opscenter-4-1