I have table in cassandra where A(String) and B (int) are partiton keys together I am writing sql query in spark sql
select ("SELECT * from table where A IN ("221",...) and B IN(32,323...));
On Explain plan it seems to be doing batch scan instead of direct join on partition keys
== Physical Plan ==
Project [A,B ... other columns]
+- BatchScan[A,B ... other columns] Cassandra Scan: dev.table
- Cassandra Filters: [["A" IN (?, ?, ?, ?), D],["B" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?), D]]
- Requested Columns: [A,B ...]
Also in documentation https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md spark.cassandra.sql.inClauseToJoinConversionThreshold is set to 25 ..
I was curious if there would be any scenario in which In clause on primary key would work better than direct join