Cassandra Defining Primary key and alternatives

Question

Here is a simple example of the user table in cassandra. What is best strategy to create a primary key.

My requirements are

search by uuid
search by username
search by email

All the keys mentioned will be high cardinality keys. Also at any moment I will be having only one of them to search

PRIMARY KEY(uid,username,email)

What if I have only the username ?, Then the above primary key is not use ful. I am not able visualize a solution to achieve this using compound primary key?

what are other options? should we go with a new table with username to uid, then search the user table. ?

From all articles out there on the internet recommends not to create secondary index for high cardinality keys

CREATE TABLE medicscity.user (
    uid uuid,
    fname text,
    lname text,
    user_id text,
    email_id text,
    password text,
    city text,
    state_id int,
    country_id int,
    dob timestamp,
    zipcode text,
    PRIMARY KEY (??)
)

How do we solve this kind of situation ?

Abhishek Anand Abhishek Anand · Accepted Answer · 2016-04-03T03:56:37

Yes, you need to go with duplicate tables.

If ever in Cassandra you face a situation in which you will have to query a table based on column1, column2 or column3 independently. You will have to duplicate the tables.

Now, how much duplication you have to use, is individual choice.

Like, in this example, you can either duplicate table with full data. Or, you can simply create a new table column1 (partition), column2, column 3 as primary key in main table. Create a new table with primary key of column1, column2, column3 and partition key on column2. Another one with same primary key and partition key on column3.

So, your data duplicate will be row, but in this case you will end up querying data twice. One from duplicate table, and one from full fledged table.

Big data technology, is there to speed up computation and let your system scale horizontally, and it comes at the expense of disk/storage. I mean just look at everything, even its base of replication factor does duplication of data.

Cassandra Defining Primary key and alternatives

2 Answers