Cassandra CQL method for paging through all rows

Question

I want to programmatically examine all the rows in a large cassandra table, and was hoping to use CQL. I know I could do this with thrift, getting 10,000 (or so) rows at a time with multiget and handing the last retrieved key into to the next multiget call. But I have looked through all the documentation on CQL select, and there doesn't seem to be a way to do this. I have resorted to setting the select limit higher and higher, and setting the timeout higher and higher to match it.

Is there an undocumented way to hand in a starting point to CQL select, or do I just need to break down and rewrite my code using the thrift API?

issues.apache.org/jira/browse/CASSANDRA-3771 is super intriguing: 'CQL < 3 silently turns a "key >= X" into "token(key) >= token(X)"'..."the only reason to do this with non-[B]OPP is to paginate through a large query"..."As someone who uses key >= X with random partitioner all the time to walk through results" — Tao Starbow

Tao Starbow Tao Starbow · Accepted Answer · 2012-08-09T18:02:05

Turns out greater than and less than have a very non-intuitive, but useful, behavior (at least in CQL2, I haven't check CQL3 yet). It actually compares the tokens not the key values. Here is an example:

> create table users (KEY varchar PRIMARY KEY, data varchar);
> insert into users (KEY, 'data') values ('1', 'one');
> insert into users (KEY, 'data') values ('2', 'two');
> insert into users (KEY, 'data') values ('3', 'three');
> insert into users (KEY, 'data') values ('4', 'four');
> select * from users;
   3 | three
   2 |   two
   1 |   one
   4 |  four
> select * from users LIMIT 1;
   3 | three
> select * from users WHERE KEY > '3' LIMIT 1;
   2 |  two
> select * from users WHERE KEY > '2' LIMIT 1;
   1 |  one
> select * from users WHERE KEY > '1' LIMIT 1;
   4 | four

Cassandra CQL method for paging through all rows

2 Answers