We are exploring SPARK for cassandra in order to over come limitations with CQL.
We were initially restricted to CQL but faced few road blocks/hurdles over RDBMS. To name a few as below
- For comparing >(Greater than) and < (Less than) on a column, we are restricted to have the columns in Clustering key. Even If I have a column in Clustering, I should still provide the Partition key to do < or > on clustering key.
- Can't check for NULL on any column value
- In order to query on any column other Partition key, we have to create index on that column
- ORDER BY a column which isn't a CLUSTERING KEY
- GROUP BY Limitations
- Join Tables
I am a newbie with cassandra and end up in revisiting my schema often due to the limitations.
Hence similar to HIVE/PIG for HDFS, What additional benefits does Spark give over CQL ?