Am I violating the data modelling rule in Cassandra?

Question

I understand that we should not create 'N' number of partition under a single table because in this case, it tries to query from N number of nodes where the partitions are available.

(Modifying the example for understanding and security)

If I have a table like 'user'

CREATE TABLE user(
   user_id int PRIMARY KEY,
   user_name text,
   user_phone varint
   );

where user_id is unique.

Example - To get all the users from the table, I use the query :

select * from user;

So which means It goes to all the nodes where the partitions for the 'user_id' are available. Since I used the user_id as partition / primary key here, It will be scattered to all the nodes based on the partition_id.

Is it fine? Or Is there a better way to design this in Cassandra?

Edited :

By Keeping a single partition as 'uniquekey' and sorted by user_name will have the advantage that uniquekey will make a single partition. Is it the better design compare to the above one?

CREATE TABLE user(
   user_id int,
   user_name text,
   user_phone varint,
   primary key ('uniquekey', user_name));



select * from user where user_id = 'uniquekey';

Distributed databases like Cassandra aim for scaling horizontally and hence require partition key based queries. It isn't designed to serve "select * " type of queries. Also the alternate datamodel provided above is worse as the entire data would end up in a single node. So not recommended. — dilsingi
There are bucketing strategies to split the partition not to have 1000 entries altogether. The recommendation is to keep a partition size < 100MB (though theoretically one could store a lot more). Just curious, why would an app need all 1000 entries to be served out (a.k.a "select *")? — dilsingi
The short answer, is "yes you are violating modeling rules with Cassandra." Postgres is probably a much better fit for this. — Aaron
Having to employ extra engineering to do something simple, is a good indication that you are using the wrong data store. Also, using a "constant" partition key creates hotspots in your cluster. — Aaron

Ev3rlasting Ev3rlasting · Accepted Answer · 2018-01-24T09:45:43

A fundamental table design rule in Cassandra is called Query-Driven, which means you usually understand what are you trying to query on before you make the table schema.

If you just want to simply return all the rows (select * ) in the database (which is not a common use case for Cassandra since Cassandra aims to store very, very large amount of data), whatever you designed is fine. But Cassandra might not be the best choice in this case.

How to ensure a good table design in Cassandra? Ref: Basic Rules of Cassandra Data Modeling

Am I violating the data modelling rule in Cassandra?

1 Answers