I need to find out if the count of records in Cassandra table is greater than certain number, e.g 10000.
I still don't have large data set, but at a large scale, with possible billions of records, how would I be able to achieve this efficiently?
There could potentially be billions of records, or just thousands. I just need to know if there are more or less than 10K.
This below doesn't seem right, I think it would fail or be very slow for large number of records.
SELECT COUNT(*) FROM data WHERE sourceId = {id} AND timestamp < {endDate} AND timestamp > {startDate};
I could also do something like this:
SELECT * FROM data WHERE sourceId = {id} AND timestamp < {endDate} AND timestamp > {startDate} LIMIT 10000;
and count in memory
I can't have new table used for counting, e.g, when a new record is written, increase counter, that option is unacceptable.
Is there some other way to do this? Select with limit looks dumb, but seems most viable.
sourceId is partition key and timestamp is clustering key.
Cassandra version is 3.11.4, and I work in Spring if it has any relevance.