So, my original problem was using the token() function to page through a large data set in Cassandra 1.2.9, as explained and answered here: Paging large resultsets in Cassandra with CQL3 with varchar keys
The accepted answer got the select working with tokens and chunk size, but another problem manifested itself.
My table looks like this in cqlsh:
key | column1 | value
---------------+-----------------------+-------
85.166.4.140 | county_finnmark | 4
85.166.4.140 | county_id_20020 | 4
85.166.4.140 | municipality_alta | 2
85.166.4.140 | municipality_id_20441 | 2
93.89.124.241 | county_hedmark | 24
93.89.124.241 | county_id_20005 | 24
The primary key is a composite of key and column1. In CLI, the same data looks like this:
get ip['85.166.4.140'];
=> (counter=county_finnmark, value=4)
=> (counter=county_id_20020, value=4)
=> (counter=municipality_alta, value=2)
=> (counter=municipality_id_20441, value=2)
Returned 4 results.
The problem
When using cql with a limit of i.e. 100, the returned results may stop in the middle of a record, like this:
key | column1 | value
---------------+-----------------------+-------
85.166.4.140 | county_finnmark | 4
85.166.4.140 | county_id_20020 | 4
leaving these to "rows" (columns) out:
85.166.4.140 | municipality_alta | 2
85.166.4.140 | municipality_id_20441 | 2
Now, when I use the token() function for the next page like, these two rows are skipped:
select * from ip where token(key) > token('85.166.4.140') limit 10;
Result:
key | column1 | value
---------------+------------------------+-------
93.89.124.241 | county_hedmark | 24
93.89.124.241 | county_id_20005 | 24
95.169.53.204 | county_id_20006 | 2
95.169.53.204 | county_oppland | 2
So, no trace of the last two results from the previous IP address.
Question
How can I use token() for paging without skipping over cql rows? Something like:
select * from ip where token(key) > token(key:column1) limit 10;