I working on spark 1.2.1 with datastax/spark-cassandra-connector and C* table filled with 1B+ rows (datastax-enterprise dse 4.7.0). I need to perform a range filter/where query on time stamp parameter.
What is the best way to do it without loading the whole 1B+ rows table to sparks memory (it could take hours to finish) and practically push the query back to C*?
Using rdd with JoinWithCassandraTable or using data frame with pushdown? Is there something else?