Then, it seems PK is not used in clustering data within the node and
that sounds wrong. What if I have a simple primary with with just PK?
Will Cassandra only distribute data across nodes and not order data
within each node since there is no clustering column?
Good question. Let's try this out. I'll create a simple table and INSERT
some data:
aploetz@cqlsh:stackoverflow> CREATE TABLE programs
(name text PRIMARY KEY, data text);
aploetz@cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Tron');
aploetz@cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Yori');
aploetz@cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Quorra');
aploetz@cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Clu');
aploetz@cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Flynn');
aploetz@cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Zuze');
Now, let's run a query that should answer your question:
aploetz@cqlsh:stackoverflow> SELECT name, token(name) FROM programs;
name | system.token(name)
--------+----------------------
Flynn | -1059892732813900311
Zuze | 1815531347795840810
Yori | 2854211700591734382
Quorra | 3079126743186967718
Tron | 6359222509420865788
Clu | 8304850648940574176
(6 rows)
As you can see, they are definitely not in order by name
, which is the partition key and lone PRIMARY KEY. But, my query runs the token()
function on name
, which shows the hashed value of the partition key (name
in this case). The results are ordered by that.
So to answer your question, Cassandra orders its partitions by the hashed value of the partition key. Note that this order is maintained throughout the cluster, not just on a single node. Therefore, results for an unbound query (not recommended to be run in a multi-node configuration) will be ordered by the hashed value of the partition key, regardless of the number of nodes in the cluster.