How do I obtain all the row names in a Cassandra table, efficiently?

Question

I have a straightforward schema in Cassandra where I am storing a large amount of financial markets time series as per Pattern 1 in the recommendation here ie: There are about 2000 tickers, each in their own row, and there are between 100k and 1m time series data points ("columns") for each of these. How do I efficiently get the ticker names (ie the 2000 ticker names? In the above analogy, the blue "WeatherStationID" cells?). The only difference from the above is that I have the rows ordered by time DESC so my actual table looks like this:

CREATE TABLE ts.mindesc (
    ticker text,
    time timestamp,
    close float,
    high float,
    low float,
    numevents int,
    open float,
    source text,
    value float,
    PRIMARY KEY (ticker, time)
) WITH CLUSTERING ORDER BY (time DESC)

Do I have to select * from the whole table above a certain timestamp value and parse this? The problem with that is that not all tickers have recent data so I might have to go quite far back in terms of time to capture them all, in which case for tickers with lots of recent data I could be getting through large amounts of redundant information.

Aaron Aaron · Accepted Answer · 2015-11-16T14:36:40

Actually, you should be able to query the partition keys very quickly by using DISTINCT:

SELECT DISTINCT ticker FROM ts.mindesc;

While 100k rows in CQL probably wouldn't perform well with an unbound WHERE, 2000 partition keys shouldn't be a problem.

How do I obtain all the row names in a Cassandra table, efficiently?

1 Answers