Cassandra - WHERE clause with non primary key disadvantages

Question

I am new to cassandra and I am using it for analytics tasks (good indexing needed ).

I read in this post (and others): cassandra, select via a non primary key that I can't query my DB with a non-primary key columns with WHERE clause.

To do so, it seems that there is 3 possibilities (ALL with major disadvantages):

Create a secondary index (not recommended for performance issues).
Create a new table (I don't want redundant data even if it's ok with cassandra).
Put the column I want to query by within the primary key and in this case I need to define all the parts of the primary key in my WHERE clause and I can't uses other operator than IN or =.

Is there an other way to to what I am trying to do (WHERE clause with non-primary key column) without having the 3 constraints above?

Cassandra really isn't a good fit for the use case that you are describing. It sounds like you need query flexibility, and you simply will not get that out of Cassandra. The bottom line, is that the recommendation to create query tables (with redundant data) is a scalable solution; whereas trying to use Cassandra like a relational database is not. — Aaron
Hi @Aaron oups, the problem is that for query flexibility mongodb is recommended more than cassandra but for read/write performance (which is highly important in my case) and the latter is very bad in this point. — farhawa
And the only way you will ever see that performance, is to take a query-based modeling approach using redundant data. Cassandra performs pretty terribly when you try to use a relational model or similar methods to achieve query flexibility. — Aaron
I would suggest watching this course from datastax on data modeling, this along with the Core Concepts course provides a pretty solid foundation: academy.datastax.com/courses/ds220-data-modeling — bechbd

bechbd bechbd · Accepted Answer · 2016-02-20T18:45:40

From within Cassandra itself you are limited to the options that you have specified above. If you want to know why take a look here:

A Deep Look to the CQL Where Clause

However if you are trying to run analytics on information stored within Cassandra then have you looked at using Spark. Spark is built for large scale data processing on distributed systems. In fact if you are looking at using Datastax (see here) which has some nice integration features between Spark and Cassandra specifically for loading and saving data. It has both a free (Community) and paid (Enterprise) editions.

Cassandra - WHERE clause with non primary key disadvantages

4 Answers