I have C* column family to store events-like data. Column family created in CQL3 in this way:
CREATE TABLE event (
hour text,
stamp timeuuid,
values map<text, text>,
PRIMARY KEY (hour, stamp)
) WITH CLUSTERING ORDER BY (stamp DESC)
Partitioner is the Murmur3 partitioner. Then I tried to build Spark query to that data through Calliope library. In results I receive two problems:
- In my case there are more than 1000 records for the clustering key ('hour' field), but response contains only first 1000 records per key. I can increase page size in query to receive more data, but as far as I understand it must be task of the paginator to go through the data and slice it.
- I receive each record more than once.
About first problem I get the answer from Calliope author that the CQL3 driver must paginate data. He recommends me to read the DataStax article. But I can't find the answer how to build query with right instructions to the driver.
About second problem I found that it was an issue with Hadoop connector in Cassandra < 1.2.11. But I use C* 2.0.3 and rebuild Spark with the required version of libraries. Also I use Calliope version 0.9.0-C2-EA.
Could you point me to the documentation or code samples which explains right way to solve these problems or demonstrate workarounds? I suppose that I use C*-to-Spark connector in improper way, but I can't find solution.
Thank you in advance.