13
votes

I get the above error when I try to use following cql statement, not sure whats wrong with it.

CREATE TABLE Stocks(
  id uuid,
  market text,
  symbol text,
  value text,
  time timestamp,
  PRIMARY KEY(id)
) WITH CLUSTERING ORDER BY (time DESC);
Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive

But this works fine, can't I use some column which is not part of primary key to arrange my rows ?

CREATE TABLE timeseries (
         ...   event_type text,
         ...   insertion_time timestamp,
         ...   event blob,
         ...   PRIMARY KEY (event_type, insertion_time)
         ... )
         ... WITH CLUSTERING ORDER BY (insertion_time DESC);
2

2 Answers

15
votes

"can't I use some column which is not part of primary key to arrange my rows?"

No, you cannot. From the DataStax documentation on the SELECT command:

ORDER BY clauses can select a single column only. That column has to be the second column in a compound PRIMARY KEY. This also applies to tables with more than two column components in the primary key.

Therefore, for your first CREATE to work, you will need to adjust your PRIMARY KEY to this:

PRIMARY KEY(id,time)

The second column of in a compound primary key is known as the "clustering column." This is the column that determines the on-disk sort order of data within a partitioning key. Note that last part in italics, because it is important. When you query your Stocks column family (table) by id, all "rows" of column values for that id will be returned, sorted by time. In Cassandra you can only specify order within a partitioning key (and not for your entire table), and your partitioning key is the first key listed in a compound primary key.

Of course the problem with this, is that you probably want id to be unique (which means that CQL will only ever return one "row" of column values per partitioning key). Requiring time to be part of the primary key negates that, and makes it possible to store multiple values for the same id. This is the problem with partitioning your data by a unique id. It might be a good idea in the RDBMS world, but it can make querying in Cassandra more difficult.

Essentially, you are going to need to revisit your data model here. For instance, if you wanted to query prices over time, you could name the table something like "StockPriceEvents" with a primary key of (id,time) or (symbol,time). Querying that table would give you the prices recorded for each id or symbol, sorted by time. Now that may or may not be of any value to your use case. Just trying to explain how primary keys and sort order work in Cassandra.

Note: You should really use column names that have more meaning. Things like "id," "time," and "timeseries" are pretty vague don't really describe anything about the context in which they are used.

2
votes

While creating a Table in Cassandra with "CLUSTERING ORDER BY" option, make sure the clustering column is Primary column.

Below table created with clustering column ,but the clustering column "Datetime" is not a Primary key column. Hence below error.

ERROR_SCRIPT

cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data ( 
       ...    IP text,
       ...    URL text,
       ...    Status text,
       ...    UserAgent text,
       ...    Datetime timestamp,
       ...    PRIMARY KEY (IP)
       ...    ) WITH CLUSTERING ORDER BY (Datetime DESC);

ERROR: InvalidRequest: Error from server: code=2200 [Invalid query] message="Only clustering key columns can be defined in CLUSTERING ORDER directive"

CORRECTED_SCRIPT (Where the "Datetime" is added into the Primary Key columns)

cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data ( 
       ...    IP text,
       ...    URL text,
       ...    Status text,
       ...    UserAgent text,
       ...    Datetime timestamp,
       ...    PRIMARY KEY (IP,Datetime)
       ...    ) WITH CLUSTERING ORDER BY (Datetime DESC);