I have a table in cassandra where date is not part of partition key but it is part of clustering key. While reading the table in spark I am applying date filter and it is being pushed down. I want to understand how push down works because through cql we cannot query directly on clustering key. Is the data being filtered somewhere?
Implementation in Java:
transactions.filter(transactions.col("timestamp").gt(timestamp)) //column timestamp is of type timestamp
and the physical plan coming out as
== Physical Plan ==
*Project [customer_user_id#67 AS customerUserId#111, cast(timestamp#66 as date) AS date#112, city#70]
+- *Filter (isnotnull(timestamp#66) && isnotnull(city#70))
+- *Scan org.apache.spark.sql.cassandra.CassandraSourceRelation@571db8b4 [customer_user_id#67,timestamp#66,city#70] PushedFilters: [IsNotNull(timestamp), *GreaterThan(timestamp,2018-08-13 00:00:00.0), IsNotNull(city)], ReadSchema: struct<customerUserId:int,date:date,city:string>
Also for timestamp part this worked fine but if column is of type date
then if was not pushing the filter even if date was part of partition key. I had to write it as transactions.filter("date >= cast('" + timestamp + "'as date)")
to make it work. (column date is of type date)