1
votes

I am building an application which process very large data(more that 3 million).I am new to cassandra and I am using 5 node cassandra cluster to store data. I have two column families

Table 1 : CREATE TABLE keyspace.table1 (
    partkey1 text,
    partkey2 text,
    clusterKey text,
    attributes text,
    PRIMARY KEY ((partkey1, partkey2), clusterKey1)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Table 2 : CREATE TABLE keyspace.table2 (
    partkey1 text,
    partkey2 text,
    clusterKey2 text,
    attributes text,
    PRIMARY KEY ((partkey1, partkey2), clusterKey2)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

note : clusterKey1 and clusterKey2 are randomly generated UUID's

My concern is on nodetool cfstats I am getting good throughput on Table1 with stats :

  • SSTable count: 2
  • Space used (total): 365189326
  • Space used by snapshots (total): 435017220
  • SSTable Compression Ratio: 0.2578485727722293
  • Memtable cell count: 18590
  • Memtable data size: 3552535
  • Memtable switch count: 171
  • Local read count: 0
  • Local read latency: NaN ms
  • Local write count: 2683167
  • Local write latency: 1.969 ms
  • Pending flushes: 0
  • Bloom filter false positives: 0
  • Bloom filter false ratio: 0.00000
  • Bloom filter space used: 352

where as for table2 I am getting very bad read performance with stats :

  • SSTable count: 33
  • Space used (live): 212702420
  • Space used (total): 212702420
  • Space used by snapshots (total): 262252347
  • SSTable Compression Ratio: 0.1686948750752438
  • Memtable cell count: 40240
  • Memtable data size: 24047027
  • Memtable switch count: 89
  • Local read count: 24027
  • Local read latency: 0.580 ms
  • Local write count: 1075147
  • Local write latency: 0.046 ms
  • Pending flushes: 0
  • Bloom filter false positives: 0
  • Bloom filter false ratio: 0.00000
  • Bloom filter space used: 688

I was wondering why table2 is creating 33 SSTables and why is the read performance very low in it. Can anyone help me figure out what I am doing wrong here?

This is how I query the table :

 BoundStatement selectStamt;
if (selectStamt == null) {
            PreparedStatement prprdStmnt = session
                    .prepare("select * from table2 where clusterKey1 = ? and partkey1=? and partkey2=?");
            selectStamt = new BoundStatement(prprdStmnt);
        }
        synchronized (selectStamt) {
            res = session.execute(selectStamt.bind("clusterKey", "partkey1", "partkey2"));
        }

In another thread, I am doing some update operations on this table on different data the same way.

In case of measuring throughput, I measuring number of records processed per sec and its processing only 50-80 rec.

3
Low (sub ms) "Local read latency" is something good.Schildmeijer
yeah 500 microseconds would make almost everyone high frequency traders happy :). Although cfstats (depending on version I think) does get reset after execution, so maybe the performance Rijo is seeing is worse than this shows.Andy Tolbert

3 Answers

5
votes

When you have a lot of SSTables, the distribution of your data among those SSTables is very important. Since you are using SizeTieredCompactionStrategy, SSTables get compacted and merged approximately when there are 4 SSTables the same size.

If you are updating data within the same partition frequently and at different times, it's likely your data is spread across many SSTables which is going to degrade performance as there will be multiple reads of your SSTables.

In my opinion, the best way to confirm this is to execute cfhistograms on your table:

nodetool -h localhost cfhistograms keyspace table2

Depending on the version of cassandra you have installed, the output will be different, but it will include a histogram of number of SSTables read for a given read operation.

If you are updating data within the same partition frequently and at different times, you could consider using LeveledCompactionStrategy (When to use Leveled Compaction). LCS will keep data from the same partition together in the same SSTable within a level which greatly improves read performance, at the cost of more Disk I/O doing compaction. In my experience, the extra compaction disk I/O more than pays off in read performance if you have a high ratio of reads to writes.


EDIT: With regards to your question about your throughput concerns, there are a number of things that are limiting your throughput.

  1. A possible big issue is that unless you have many threads making that same query at a time, you are making your request serially (one at a time). By doing this, you are severely limiting your throughput as another request can not be sent until you get a response from Cassandra. Also, since you are synchronizing on selectStmt, even if this code were being executed by multiple threads, only one request could be executed at a time anyways. You can dramatically improve throughput by having multiple worker threads that make the request for you (if you aren't already doing this), or even better user executeAsync to execute many requests asynchronously. See Asynchronous queries with the Java driver for an explanation on how the request process flow works in the driver and how to effectively use the driver to make many queries.
  2. If you are executing this same code each time you make a query, you are creating an extra roundtrip by calling 'session.prepare' each time to create your PreparedStatement. session.prepare sends a request to cassandra to prepare your statement. You only need to do this once and you can reuse the PreparedStatement each time you make a query. You may be doing this already given your statement null-checking (can't tell without more code).
  3. Instead of reusing selectStmt and synchronizing on it, just create a new BoundStatement off of the single PreparedStatement you are using each time you make a query. This way no synchronization is needed at all.
2
votes

Aside from switching compaction strategies (this is expensive, you will compact hard for a while after the change) which as Andy suggests will certainly help your read performance, you can also tune your current compaction strategy to try to get rid of some of the fragmentation:

  1. If you have pending compactions (nodetool compactionstats) -- then try to catch up by increasing compactionthrottling. Keep concurrent compactors to 1/2 of your CPU cores to avoid compaction from hogging all your cores.
  2. Increase bucket size (increase bucket_high, drop bucket low)- dictates how similar sstables have to be in size to be compacted together.
  3. Drop Compaction threshold - dictates how many sstables must fit in a bucket before compaction occurs.

For details on 2 and 3 check out compaction subproperties

Note: do not use nodetool compact. This will put the whole table in one huge sstable and you'll loose the benefits of compacting slices at a time.

  1. In case of emergencies use JMX --> force user defined compaction to force minor compactions
0
votes

You have many SSTable's and slow reads. The first thing you should do is to find out how many SSTable's are read per SELECT.

The easiest way is to inspect the corresponding MBean: In the MBean domain "org.apache.cassandra.metrics" you find your keyspace, below it your table and then the SSTablesPerReadHistorgram MBean. Cassandra records min, max, mean and also percentiles.

A very good value for the 99th percentile in SSTablesPerReadHistorgram is 1, which means you normally read only from a single table. If the number is about as high as the number of SSTable's, Cassandra is inspecting all SSTable's. In the latter case you should double-check your SELECT, whether you are doing a select on the whole primary key or not.