What is the most performance efficient way to load a single (or a few) wide rows from Cassandra to C#? My wide rows have 10.000-100.000 columns. The primary keys consists of several values but the column key is a single string and the column value is a single counter (see the schema below).
Using "tracing on" in the cqlsh I can see that Cassandra can select a wide row with 17.000 columns in 44 m, but loading this data all the way into C# using the Datastax driver takes 700 ms. Is there a faster way? I need to load the full wide row in 50-100ms. (Is there a more native way? A way minimizing the network traffic? A faster driver? Another configuration of the driver? Or something else?)
I actually do not need all 17.000 columns. I just need the columns where ‘support’ >= 2 or the top 1000 columns sorted descending by ‘support’. But since ‘support’ is my column value I don't know of any way to query like this in CQL.
This is my table:
CREATE TABLE real_time.grouped_feature_support (
algorithm_id int,
group_by_feature_id int,
select_feature_id int,
group_by_feature_value text,
select_feature_value text,
support counter,
PRIMARY KEY ((algorithm_id, group_by_feature_id, select_feature_id, group_by_feature_value), select_feature_value)
This is my way to access the data using the Datastax driver:
var table = session.GetTable<GroupedFeatureSupportDataEntry>();
var query = table.Where(x => x.CustomerAlgorithmId == customerAlgorithmId
&& x.GroupByFeatureId == groupedFeatureId
&& myGroupedFeatureValues.Contains(x.GroupByFeatureValue)
&& x.GroupByFeatureValue == groupedFeatureValue
&& x.SelectFeatureId == selectFeatureId)
.Select(x => new
{
x.GroupByFeatureValue,
x.SelectFeatureValue,
x.Support,
})
.Take(1000000);
var result = query.Execute();