Spark Cassandra connector - where clause

Question

I am trying to do some analytics on time series data stored in cassandra by using spark and the new connector published by Datastax.

In my schema the Partition key is the meter ID and I want to run spark operations only on specifics series, therefore I need to filter by meter ID.

I would like then to run a query like: Select * from timeseries where series_id = X

I have tried to achieve this by doing:

JavaRDD<CassandraRow> rdd = sc.cassandraTable("test", "timeseries").select(columns).where("series_id = ?",ids).toJavaRDD();

When executing this code the resulting query is:

SELECT "series_id", "timestamp", "value" FROM "timeseries" WHERE token("series_id") > 1059678427073559546 AND token("series_id") <= 1337476147328479245 AND series_id = ? ALLOW FILTERING

A clause is automatically added on my partition key (token("series_id") > X AND token("series_id") <=Y) and then mine is appended after that. This obviously does not work and I get an error saying: "series_id cannot be restricted by more than one relation if it includes an Equal".

Is there a way to get rid of the clause added automatically? Am I missing something?

Thanks in advance

rs_atl rs_atl · Accepted Answer · 2014-07-28T15:01:48

The driver automatically determines the partition key using table metadata it fetches from the cluster itself. It then uses this to append the token ranges to your CQL so that it can read a chunk of data from the specific node it's trying to query. In other words, Cassandra thinks series_id is your partition key and not meter_id. If you run a describe command on your table, I bet you'll be surprised.

Spark Cassandra connector - where clause

1 Answers