I am trying to figure out what advantages that a compound partition key can provide. Look at the famous weather station example below.
CREATE TABLE temperature ( state text, city text, event_time timestamp, temperature text, PRIMARY KEY ((state, city),event_time) );
Now, I most of time query into one single state on a set of cities and a range of dates. So the query is like
SELECT * FROM temperature WHERE state = 'NY' AND city IN ('mahattan', 'brooklyn','queens') AND event_time > '2016-01-01'.
Assuming I have a large data set, in sense that I have a few states (# < 1000) but for each state I have many many cities ( # > 100M). I replicate the data and distribute them into different nodes.
Question: can you compare the differences using
PRIMARY KEY (**(state, city)**,event_time)
PRIMARY KEY (**(city, state)**,event_time)
PRIMARY KEY (state, city,event_time)
PRIMARY KEY (zipcode, event_time)
Thank you!