1>Is the Primary key supposed to be unique per record? (My guess would
be not.) To elaborate. Suppose my table looks like this
CREATE TABLE user_action ( user_id int, action text, date_of_action
date, PRIMARY KEY (user_id) )
Primary key is supposed to be unique per record /row. In the example you mentioned, you can have only one record for user_id. For allowing multiple rows with same user_id, you have to introduce a differentiating key. This key is called clustering key in Cassandra and it forms a part of primary key.
Primary key is a combination of (partition key and clustering key(s)). Partition key is used by Cassandra to find a partition/record. If clustering key is defined in data model then it will be used to differentiate different rows. If no clustering key is defined as in your case then only one record will be kept in database.
In example below you can have same user_id records who live different states. Here Primary key is combination of (user_id, state). user_id is the partition key and state is clustering key.
CREATE TABLE user_action (
user_id int,
state text,
action text,
date_of_action date,
PRIMARY KEY (user_id,state)
)
I am guessing I can have multiple rows with the same user_id
As explained above you can have multiple rows with the same user_id if you define a clustering key otherwise with the example you quoted, it is not possible.
2>If primary key is not one per record, can a primary key be split
across many partitions?
Primary key cannot be split across many partitions. As explained above partition key part of primary key will always point to unique partition.
3>Can a partition have multiple primary keys?
In the example I have quoted, (1,RJ), (1,GJ) can be possible primary keys pointing towards single partition pointed by parition key value 1. So you can have multiple primary keys for a partitions in that sense.
4>Is the primary key itself decided to pick the partition or is the
hashCode of the primary key used to pick a partition?
Hashcode of partition key (part of primary key) is used to get the partition
5>Is it fair to think of a partition as a file?
It will depend on your data model.