What is the fastest way to load a wide row from Cassandra to C#?

Question

What is the most performance efficient way to load a single (or a few) wide rows from Cassandra to C#? My wide rows have 10.000-100.000 columns. The primary keys consists of several values but the column key is a single string and the column value is a single counter (see the schema below).

Using "tracing on" in the cqlsh I can see that Cassandra can select a wide row with 17.000 columns in 44 m, but loading this data all the way into C# using the Datastax driver takes 700 ms. Is there a faster way? I need to load the full wide row in 50-100ms. (Is there a more native way? A way minimizing the network traffic? A faster driver? Another configuration of the driver? Or something else?)

I actually do not need all 17.000 columns. I just need the columns where ‘support’ >= 2 or the top 1000 columns sorted descending by ‘support’. But since ‘support’ is my column value I don't know of any way to query like this in CQL.

This is my table:

CREATE TABLE real_time.grouped_feature_support (
    algorithm_id int,
    group_by_feature_id int,
    select_feature_id int,
    group_by_feature_value text,
    select_feature_value text,
    support counter,
    PRIMARY KEY ((algorithm_id, group_by_feature_id, select_feature_id, group_by_feature_value), select_feature_value)

This is my way to access the data using the Datastax driver:

var table = session.GetTable<GroupedFeatureSupportDataEntry>();
var query = table.Where(x => x.CustomerAlgorithmId == customerAlgorithmId
    && x.GroupByFeatureId == groupedFeatureId
    && myGroupedFeatureValues.Contains(x.GroupByFeatureValue)
    && x.GroupByFeatureValue == groupedFeatureValue
    && x.SelectFeatureId == selectFeatureId)
    .Select(x => new
    {
        x.GroupByFeatureValue,
        x.SelectFeatureValue,
        x.Support,
    })
    .Take(1000000);
var result = query.Execute();

jorgebg jorgebg · Accepted Answer · 2014-12-16T10:49:49

If you are looking for the best performance when retrieving a large result set you should not use a mapping component like Linq-to-cql or any other.

You can retrieve the rows using the technique documented on the driver readme, in your case it would be something like:

var query = "SELECT * from grouped_feature_support WHERE" + 
            " algorithm_id = ? AND group_by_feature_id = ? " +
            " AND select_feature_id = ? AND group_by_feature_value = ?";
//Prepare the query once in your application lifetime
var ps = session.Prepare(query);
//Reuse the prepared statement by binding different parameters to it
var rs = session.Execute(ps.Bind(parameters));
foreach (var row in rs)
{
  //The enumerator will yield all the rows from Cassandra
  //Retrieving them in the back in blocks of 5000 (determined by the pagesize).
}
//You can also use a IEnumerable<T> Linq Extensions to filter
var filteredRows = rs.Where(r => r.GetValue<long>("support") > 2);

What is the fastest way to load a wide row from Cassandra to C#?

1 Answers