How Does Composite Column PartitionKey Work in Cassandra

Question

I am trying to figure out what advantages that a compound partition key can provide. Look at the famous weather station example below.

CREATE TABLE temperature ( state text, city text, event_time timestamp, temperature text, PRIMARY KEY ((state, city),event_time) );

Now, I most of time query into one single state on a set of cities and a range of dates. So the query is like

SELECT * FROM temperature WHERE state = 'NY' AND city IN ('mahattan', 'brooklyn','queens') AND event_time > '2016-01-01'.

Assuming I have a large data set, in sense that I have a few states (# < 1000) but for each state I have many many cities ( # > 100M). I replicate the data and distribute them into different nodes.

Question: can you compare the differences using

PRIMARY KEY (**(state, city)**,event_time)

PRIMARY KEY (**(city, state)**,event_time)

PRIMARY KEY (state, city,event_time)

PRIMARY KEY (zipcode, event_time)

Thank you!

RussS RussS · Accepted Answer · 2016-06-02T00:05:28

Composite Key

PRIMARY KEY (**(state, city)**,event_time)
PRIMARY KEY (**(city, state)**,event_time)

Are functionally equivalent. The composite partition key will be the combined values of city and state. You will be unable to fully specify a partition without both portions. Within the partition cells will be ordered by event_time. You will have #State * #City Partitions

[city, state] -> [event_time_0, event_time_1, event_time_2, event_time_3, ...]

You will be able to write queries like

SELECT * FROM TABLE WHERE CITY = X AND STATE = Y AND event_time (><=) SomeValue

Compound Keys

PRIMARY KEY (state, city,event_time)

One partition is made for every state. This is probably bad since there are on the order of 100x state/provinces which means you will only have a very small number of partitions. Data will be laid out within the partition by city and event_time.

[Illinois] --> [Chicago, 0], [Chicago, 1], [Peoria, 0], [Peoria, 1]

Queries will have to restrict city if they are also restricting event time.

PRIMARY KEY (zipcode, event_time)

You will have up to 10k Partitions, each will have a single cell for each event time.

How Does Composite Column PartitionKey Work in Cassandra

1 Answers

Composite Key

Compound Keys