I have a straightforward schema in Cassandra where I am storing a large amount of financial markets time series as per Pattern 1 in the recommendation here ie: There are about 2000 tickers, each in their own row, and there are between 100k and 1m time series data points ("columns") for each of these. How do I efficiently get the ticker names (ie the 2000 ticker names? In the above analogy, the blue "WeatherStationID" cells?). The only difference from the above is that I have the rows ordered by time DESC so my actual table looks like this:
CREATE TABLE ts.mindesc (
ticker text,
time timestamp,
close float,
high float,
low float,
numevents int,
open float,
source text,
value float,
PRIMARY KEY (ticker, time)
) WITH CLUSTERING ORDER BY (time DESC)
Do I have to select * from the whole table above a certain timestamp value and parse this? The problem with that is that not all tickers have recent data so I might have to go quite far back in terms of time to capture them all, in which case for tickers with lots of recent data I could be getting through large amounts of redundant information.