Cassandra low read performance with high SSTable count

Question

I am building an application which process very large data(more that 3 million).I am new to cassandra and I am using 5 node cassandra cluster to store data. I have two column families

Table 1 : CREATE TABLE keyspace.table1 (
    partkey1 text,
    partkey2 text,
    clusterKey text,
    attributes text,
    PRIMARY KEY ((partkey1, partkey2), clusterKey1)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Table 2 : CREATE TABLE keyspace.table2 (
    partkey1 text,
    partkey2 text,
    clusterKey2 text,
    attributes text,
    PRIMARY KEY ((partkey1, partkey2), clusterKey2)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

note : clusterKey1 and clusterKey2 are randomly generated UUID's

My concern is on nodetool cfstats I am getting good throughput on Table1 with stats :

SSTable count: 2
Space used (total): 365189326
Space used by snapshots (total): 435017220
SSTable Compression Ratio: 0.2578485727722293
Memtable cell count: 18590
Memtable data size: 3552535
Memtable switch count: 171
Local read count: 0
Local read latency: NaN ms
Local write count: 2683167
Local write latency: 1.969 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 352

where as for table2 I am getting very bad read performance with stats :

SSTable count: 33
Space used (live): 212702420
Space used (total): 212702420
Space used by snapshots (total): 262252347
SSTable Compression Ratio: 0.1686948750752438
Memtable cell count: 40240
Memtable data size: 24047027
Memtable switch count: 89
Local read count: 24027
Local read latency: 0.580 ms
Local write count: 1075147
Local write latency: 0.046 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 688

I was wondering why table2 is creating 33 SSTables and why is the read performance very low in it. Can anyone help me figure out what I am doing wrong here?

This is how I query the table :

 BoundStatement selectStamt;
if (selectStamt == null) {
            PreparedStatement prprdStmnt = session
                    .prepare("select * from table2 where clusterKey1 = ? and partkey1=? and partkey2=?");
            selectStamt = new BoundStatement(prprdStmnt);
        }
        synchronized (selectStamt) {
            res = session.execute(selectStamt.bind("clusterKey", "partkey1", "partkey2"));
        }

In another thread, I am doing some update operations on this table on different data the same way.

In case of measuring throughput, I measuring number of records processed per sec and its processing only 50-80 rec.

yeah 500 microseconds would make almost everyone high frequency traders happy :). Although cfstats (depending on version I think) does get reset after execution, so maybe the performance Rijo is seeing is worse than this shows. — Andy Tolbert

Andy Tolbert Andy Tolbert · Accepted Answer · 2015-03-18T21:03:20

When you have a lot of SSTables, the distribution of your data among those SSTables is very important. Since you are using SizeTieredCompactionStrategy, SSTables get compacted and merged approximately when there are 4 SSTables the same size.

If you are updating data within the same partition frequently and at different times, it's likely your data is spread across many SSTables which is going to degrade performance as there will be multiple reads of your SSTables.

In my opinion, the best way to confirm this is to execute cfhistograms on your table:

nodetool -h localhost cfhistograms keyspace table2

Depending on the version of cassandra you have installed, the output will be different, but it will include a histogram of number of SSTables read for a given read operation.

If you are updating data within the same partition frequently and at different times, you could consider using LeveledCompactionStrategy (When to use Leveled Compaction). LCS will keep data from the same partition together in the same SSTable within a level which greatly improves read performance, at the cost of more Disk I/O doing compaction. In my experience, the extra compaction disk I/O more than pays off in read performance if you have a high ratio of reads to writes.

EDIT: With regards to your question about your throughput concerns, there are a number of things that are limiting your throughput.

A possible big issue is that unless you have many threads making that same query at a time, you are making your request serially (one at a time). By doing this, you are severely limiting your throughput as another request can not be sent until you get a response from Cassandra. Also, since you are synchronizing on selectStmt, even if this code were being executed by multiple threads, only one request could be executed at a time anyways. You can dramatically improve throughput by having multiple worker threads that make the request for you (if you aren't already doing this), or even better user executeAsync to execute many requests asynchronously. See Asynchronous queries with the Java driver for an explanation on how the request process flow works in the driver and how to effectively use the driver to make many queries.
If you are executing this same code each time you make a query, you are creating an extra roundtrip by calling 'session.prepare' each time to create your PreparedStatement. session.prepare sends a request to cassandra to prepare your statement. You only need to do this once and you can reuse the PreparedStatement each time you make a query. You may be doing this already given your statement null-checking (can't tell without more code).
Instead of reusing selectStmt and synchronizing on it, just create a new BoundStatement off of the single PreparedStatement you are using each time you make a query. This way no synchronization is needed at all.

Cassandra low read performance with high SSTable count

3 Answers