0
votes

I have a cassandra column family with a lot of dynamic columns. I am running a simple Spark-Cassandra connector example where I am trying to fetch all the data from this table. The issue is that it is not fetching any of the dynamic columns from my column family.

In my example and code snippet below, it is able to fetch the primary key and secondary index column for all the rows but not any of the other columns (It has 30+ more dynamic columns). I have a feeling the connector supports fetching of only partition and clustering keys as columns as of now, based on my reading here (Spark Datastax Java API Select statements). Could someone please confirm if my understanding is correct. It would be great if someone can suggest how to fix this ?

/**
 * Loads a cassandra column family as a spark RDD.
 */
public static CassandraJavaRDD<CassandraRow> getCassandraTableRDD(
        JavaSparkContext context, String keyspace, String table)
{
    return javaFunctions(context).cassandraTable(keyspace, table);
}

CREATE TABLE source_product_canonical_data_sample (
  'key' text PRIMARY KEY,
  source text
) WITH
  comment='' AND
  comparator=text AND
  read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  default_validation=text AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='LZ4Compressor';
1

1 Answers

2
votes

Your CQL table definition is not aware of your "dynamic columns". There is no compound primary key with clustering columns in it. Dynamic columns / wide-rows are terms related to the old thrift data model, and in CQL they have been replaced with compound primary key.

See this excellent blog post by Jonathan Ellis explaining how to transition to the new data model: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows