Cassandra DataStax driver slow?

Question

I have just started experimenting with Cassandra, and I'm using C# and the DataStax driver (v 3.0.8). I wanted to do some performance tests to see how fast Cassandra is handling time series data.

The results are chocking in that it takes an eternity to do a SELECT. So I guess I'm doing something wrong.

I have setup Cassandra on my local computer and I have created a table:

CREATE KEYSPACE dm WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

CREATE TABLE dm.daily_data_by_day (
    symbol text,
    value_type int,
    as_of_day date,
    revision_timestamp_utc timestamp,
    value decimal,
    PRIMARY KEY ((symbol, value_type), as_of_day, revision_timestamp_utc)
) WITH CLUSTERING ORDER BY (as_of_day ASC, revision_timestamp_utc ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

I have filled this table with about 15 million rows, divided into about 10000 partitions, each containing up to 10000 rows.

Here's the test I'm running (updated on request by phact):

[Test]
public void SelectPerformance()
{
    _cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build();
    _stopwatch = new Stopwatch();
    var items = new[]
        {
            // 20 different items...
        };

    foreach (var item in items)
    {
        var watch = Stopwatch.StartNew();
        var rows = ExecuteQuery(item.Symbol, item.FieldType, item.StartDate, item.EndDate);
        watch.Stop();
        Console.WriteLine($"{watch.ElapsedMilliseconds}\t{rows.Length}");
    }

    Console.WriteLine($"Average Execute: {_stopwatch.ElapsedMilliseconds/items.Length}");
    _cluster.Dispose();
}

private Row[] ExecuteQuery(string symbol, int fieldType, LocalDate startDate, LocalDate endDate)
{
    using (var session = _cluster.Connect("dm"))
    {
        var ps = session.Prepare(
@"SELECT
    symbol,
    value_type,
    as_of_day,
    revision_timestamp_utc,
    value
FROM
    daily_data_by_day
WHERE
    symbol = ? AND
    value_type = ? AND
    as_of_day >= ? AND as_of_day < ?");
        var statement = ps.Bind(symbol, fieldType, startDate, endDate);
        statement.EnableTracing();

        _stopwatch.Start();
        var rowSet = session.Execute(statement);
        _stopwatch.Stop();

        return rowSet.ToArray();
    }
}

The stopwatch tells me that session.Execute() takes 20-30 milliseconds to execute (update: after changing the code to create the cluster only once I'm down to about 15 milliseconds). So I enabled some tracing and got the following result:

 activity                                                                  | source_elapsed 
--------------------------------------------------------------------------------------------
 Parsing SELECT symbol, value_type, as_of_day, revision_timestamp_utc,...; |             47 
                                                       Preparing statement |             98 
                     Executing single-partition query on daily_data_by_day |            922 
                                              Acquiring sstable references |            939 
 Skipped 0/5 non-slice-intersecting sstables, included 0 due to tombstones |            978 
                                   Bloom filter allows skipping sstable 74 |           1003 
                                   Bloom filter allows skipping sstable 75 |           1015 
                                   Bloom filter allows skipping sstable 72 |           1024 
                                   Bloom filter allows skipping sstable 73 |           1032 
                                              Key cache hit for sstable 63 |           1043 
                                 Merged data from memtables and 5 sstables |           1329 
                                       Read 100 live and 0 tombstone cells |           1353

If I understand this trace correctly, Cassandra spends less than 1.4 milliseconds executing my query. So what is the DataStax driver doing the rest of the time?

(As a reference, I have done the same performance test against a local SQL Server instance resulting in about 1-2 milliseconds executing the same query from C#.)

Update:

I have attempted to do some profiling, which is not that easy to do with asynchronous code that you don't own...

My conclusion is that most of the time is spend parsing the response. Each response contains between 2000 - 3000 rows and parsing takes about 9 ms per response. Deserializing takes most of the time, about 6.5 ms, with decimal being the worst, about 3 ms per field. The other fields (text, int, date and timestamp) take about 0.5 ms per field.

Looking at my measured times I ought to have suspected this: the more rows in the response, the longer time it takes, and almost linearly.

have you performed these tests on your local cassandra env? With only one node? I suppose to profile your code. — k0ner
@k0ner This is all done on my local machine, one node only. It's for evaluating Cassandra, learning how to use it and to see how it is performing. — Torbjörn Kalin
@TorbjörnKalin Your initial code, besides not following the recommendations (like reusing the Session instance across your app / reusing prepare statement / ...), only measuring the latency of the execution of a single synchronous query multiple times. Instead, you should execute multiple queries in parallel (via async methods or using multiple tasks and an scheduler) and analyze how latencies behave and what's the throughput. Here you have an example: github.com/riptano/csharp-driver-sut — jorgebg

Arthur Landim Arthur Landim · Accepted Answer · 2016-09-13T13:49:24

@xmas79 Highlighted a great point. You should not create too many sessions instances (better use 1 for each keyspace), but there are also another guidelines that could help you. Follow the guidelines below and reference:

Use one Cluster instance per (physical) cluster (per application lifetime)
Use at most one Session per keyspace, or use a single Session and explicitly specify the keyspace in your queries
If you execute a statement more than once, consider using a PreparedStatement
You can reduce the number of network roundtrips and also have atomic operations by using Batches

http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra

EDIT

Also, taking a second look at your code, your are creating a prepared statement for every same query you are executing. The prepared statement should be created only once and you should use its reference to execute the queries. What prepared statements does is to send to the server the CQL that you will execute often so the server already parses the string and return to the user an identification for that . So, my suggestion to you is dont use it if you are not going to share the PreparedStatment object for each query. Or change your code to something like this:

[Test]
public void SelectPerformance()
{
    _cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build();
    var session = _cluster.Connect("dm");
    var ps = session.Prepare(@"SELECT symbol, value_type, as_of_day, revision_timestamp_utc, value FROM daily_data_by_day WHERE symbol = ? AND  value_type = ? AND as_of_day >= ? AND as_of_day < ?");
    var items = new[]
    {
        // 20 different items...
    };
    foreach (var item in items)
    {
        var watch = Stopwatch.StartNew();
        var rows = ExecuteQuery(session, ps, item.Symbol, item.FieldType, item.StartDate, item.EndDate);
        watch.Stop();
        Console.WriteLine($"{watch.ElapsedMilliseconds}\t{rows.Length}");
    }

    Console.WriteLine($"Average Execute: {   _stopwatch.ElapsedMilliseconds/items.Length}");
    _cluster.Dispose();
}

private Row[] ExecuteQuery(Session session, PreparedStatement ps, string symbol, int fieldType, LocalDate startDate, LocalDate endDate)
{
     var statement = ps.Bind(symbol, fieldType, startDate, endDate);
     // Do not enable request tracing for latency benchmarking
     // statement.EnableTracing();
     var rowSet = session.Execute(statement);
     return rowSet.ToArray();
}

Cassandra DataStax driver slow?

3 Answers