Confusion over data model in cassandra

Question

Hello we have a table in Cassandra whose structure is as below

CREATE TABLE dmp.user_profiles_6 (
    vuid text PRIMARY KEY,
    brand_model text,
    first_seen timestamp,
    last_seen timestamp,
    total_day_count int,
    total_usage_count int,
    user_type text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.1
    AND speculative_retry = '99PERCENTILE';

I read a few articles about data modeling in Cassandra from datastax. In in it said that primary key consists of partition key and clustering key.

Now in above case we have a vuid column which is an identifier for every unique user. It is primary key. We have 400M unique users. So now does it mean that Cassandra is making 400M partitions? Then this must degrade the performance. In one datastax article about data modeling an example table shows primary key on a uuid column which is unique and having a very high cardinality. I am totally confused, can anyone help me identify which column can be set as partition key and which as cluster key?

Queries can be as below: 1. Select record directly on basis of vuid 2. Select vuids on basis of range of last seen or first seen

Dip Dip · Accepted Answer · 2016-08-08T17:42:18

Select record directly on basis of vuid >> Your table does that. It already has vuid as a primary key.
Select vuids on basis of range of last seen or first seen >>
There are two options here: Either add last_seen or first_seen in clustering columns (you can do range selection on clustering columns only)
In this case you need to provide vuid along with last_seen and first_seen on the query. I don't think you want that.
OR
Create another table which has the same data(Yes,in C* we create another table for different query with same data and change the keys as per query. Welcome to data duplication). In this table you have to have to add a dummy column as primary key and make the last_seen and first_seen as clustering keys.You pass these seen dates in query to fetch vuid.

Hope this is clear.

Confusion over data model in cassandra

3 Answers