I know there are many time-series questions on here but mine does not seem to comfortably fit with the given solutions. I am also new to Cassandra so I might be approaching this with the wrong mindset. Bear with me.
I am receiving search data in the form:
datetime_searched, term_used, product_found
and the query I would like to make:
Given a start-date and an end-date, return all term-product pairs that fall in that time window. Initially, the window will be a month long. This may (read: will) change.
For example, given the following data:
2013-11-20 00:00:00, "christmas", "decorated tree"
2014-12-01 20:00:00, "christmas", "wrapping paper"
2014-12-23 15:00:00, "christmas", "decorated tree" (duplicate term-product)
and a query for the time-range 2014-12-01 to 2015-01-01, I would like to be able to retrieve:
"christmas", "wrapping paper"
"christmas", "decorated tree"
My initial approach looked like most examples for time series data:
CREATE TABLE search_terms (
datetime_searched timestamp,
term_used text,
product_found text,
PRIMARY KEY (term_used, date_searched)
);
SELECT term_used, product_found
FROM search_terms
WHERE datetime_searched > [start]
AND datetime_searched < [end];
but this requires me to have secondary indexes and/or allow filtering, which seems to be something I should avoid if I'm only capturing a small percentage of the data being filtered.
My second idea was to create time buckets, but this solution seems to work only if I limit the query to the buckets. It also seems to create hotspots - in my initial case, a month-long hotspot. For example, for daily buckets:
CREATE TABLE search_terms_by_day (
datetime_searched timestamp,
day_searched timestamp,
term_used text,
product_found text,
PRIMARY KEY (day_searched)
);
SELECT term_used, product_found
FROM search_terms_by_day
WHERE day_searched=[my limited query's bucket];
So what are my options? Do I constrain my requests to the bucket size, possibly creating many CFs with different bucket sizes, all while creating hotspots; am I forced to use secondary indexes; or is there another option I am unaware of?
Thanks in advance.