0
votes

I am new to Cassandra and have a few novice level questions in the primary key.

  1. Is the Primary key supposed to be unique per record? (My guess would be not.) To elaborate. Suppose my table looks like this
    CREATE TABLE user_action (
    user_id int,
    action text,
    date_of_action date,
    PRIMARY KEY (user_id)
    )

I am guessing I can have multiple rows with the same user_id

  1. If primary key is not one per record, can a primary key be split across many partitions?

  2. Can a partition have multiple primary keys?

  3. Is the primary key itself decided to pick the partition or is the hashCode of the primary key used to pick a partition?

  4. Is it fair to think of a partition as a file?

2

2 Answers

0
votes

Primary key and Partition key in some case would be the same but not always, it depends upon the number of primary keys. Data is distributing based on partition key which is unique across the Cassandra cluster. I am not explaining all the scenario and concept here but yes, you should go through the documentation and I am sure you can understand the things very quick after reading the below link.

https://www.datastax.com/blog/2016/02/most-important-thing-know-cassandra-data-modeling-primary-key

https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useCompoundPrimaryKeyConcept.html

-1
votes

1>Is the Primary key supposed to be unique per record? (My guess would be not.) To elaborate. Suppose my table looks like this

 CREATE TABLE user_action ( user_id int, action text, date_of_action
 date, PRIMARY KEY (user_id) )

Primary key is supposed to be unique per record /row. In the example you mentioned, you can have only one record for user_id. For allowing multiple rows with same user_id, you have to introduce a differentiating key. This key is called clustering key in Cassandra and it forms a part of primary key.

Primary key is a combination of (partition key and clustering key(s)). Partition key is used by Cassandra to find a partition/record. If clustering key is defined in data model then it will be used to differentiate different rows. If no clustering key is defined as in your case then only one record will be kept in database.

In example below you can have same user_id records who live different states. Here Primary key is combination of (user_id, state). user_id is the partition key and state is clustering key.

CREATE TABLE user_action (
user_id int,
state text, 
action text,
date_of_action date,
PRIMARY KEY (user_id,state)
)

I am guessing I can have multiple rows with the same user_id

As explained above you can have multiple rows with the same user_id if you define a clustering key otherwise with the example you quoted, it is not possible.

2>If primary key is not one per record, can a primary key be split across many partitions?

Primary key cannot be split across many partitions. As explained above partition key part of primary key will always point to unique partition.

3>Can a partition have multiple primary keys?

In the example I have quoted, (1,RJ), (1,GJ) can be possible primary keys pointing towards single partition pointed by parition key value 1. So you can have multiple primary keys for a partitions in that sense.

4>Is the primary key itself decided to pick the partition or is the hashCode of the primary key used to pick a partition?

Hashcode of partition key (part of primary key) is used to get the partition

5>Is it fair to think of a partition as a file?

It will depend on your data model.