36
votes

I am wondering how can I achieve pagination using Cassandra.

Let us say that I have a blog. The blog lists max 10 posts per page. To access next posts a user must click on pagination menu to access page 2 (posts 11-20), page 3 (posts 21-30), etc.

Using SQL under MySQL, I could do the following:

SELECT * FROM posts LIMIT 20,10;

The first parameter of LIMIT is offset from the beginning of result set and second argument is amount of rows to fetch. The example above returns 10 rows starting from row 20.

How can I achieve the same effect in CQL?

I have found some solutions on Google, but all of them require to have "the last result from previous query". It works for having "next" button to paginate to another 10-results-set, but what if I want to jump from page 1 to page 5?

6

6 Answers

13
votes

Try using the token function in CQL: https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useToken.html

Another suggestion, if you are using DSE, solr supports deep paging: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

63
votes

You don't need to use tokens, if you are using Cassandra 2.0+.

Cassandra 2.0 has auto paging. Instead of using token function to create paging, it is now a built-in feature.

Now developers can iterate over the entire result set, without having to care that it’s size is larger than the memory. As the client code iterates over the results, some extra rows can be fetched, while old ones are dropped.

Looking at this in Java, note that SELECT statement returns all rows, and the number of rows retrieved is set to 100.

I’ve shown a simple statement here, but the same code can be written with a prepared statement, couple with a bound statement. It is possible to disable automatic paging, if it is not desired. It is also important to test various fetch size settings, since you will want to keep the memorize small enough, but not so small that too many round-trips to the database are taken. Check out this blog post to see how paging works server side.

Statement stmt = new SimpleStatement(
                  "SELECT * FROM raw_weather_data"
                  + " WHERE wsid= '725474:99999'"
                    + " AND year = 2005 AND month = 6");
stmt.setFetchSize(24);
ResultSet rs = session.execute(stmt);
Iterator<Row> iter = rs.iterator();
while (!rs.isFullyFetched()) {
   rs.fetchMoreResults();
   Row row = iter.next();
   System.out.println(row);
}
14
votes

Manual Paging

The driver exposes a PagingState object that represents where we were in the result set when the last page was fetched:

ResultSet resultSet = session.execute("your query");
// iterate the result set...
PagingState pagingState = resultSet.getExecutionInfo().getPagingState();

This object can be serialized to a String or a byte array:

String string = pagingState.toString();
byte[] bytes = pagingState.toBytes();

This serialized form can be saved in some form of persistent storage to be reused later. When that value is retrieved later, we can deserialize it and reinject it in a statement:

PagingState pagingState = PagingState.fromString(string);
Statement st = new SimpleStatement("your query");
st.setPagingState(pagingState);
ResultSet rs = session.execute(st);

Note that the paging state can only be reused with the exact same statement (same query string, same parameters). Also, it is an opaque value that is only meant to be collected, stored an re-used. If you try to modify its contents or reuse it with a different statement, the driver will raise an error.

Src: http://datastax.github.io/java-driver/manual/paging/

3
votes

If you read this doc "Use paging state token to get next result",

https://datastax.github.io/php-driver/features/result_paging/

We can use "paging state token" to paginate at application level. So PHP logic should look like,

<?php
$limit = 10;
$offset = 20;

$cluster   = Cassandra::cluster()->withContactPoints('127.0.0.1')->build();
$session   = $cluster->connect("simplex");
$statement = new Cassandra\SimpleStatement("SELECT * FROM paging_entries Limit ".($limit+$offset));

$result = $session->execute($statement, new Cassandra\ExecutionOptions(array('page_size' => $offset)));
// Now $result has all rows till "$offset" which we can skip and jump to next page to fetch "$limit" rows.

while ($result->pagingStateToken()) {
    $result = $session->execute($statement, new Cassandra\ExecutionOptions($options = array('page_size' => $limit,'paging_state_token' => $result->pagingStateToken())));
    foreach ($result as $row) {
      printf("key: '%s' value: %d\n", $row['key'], $row['value']);
    }
}
?>
1
votes

Although the count is available in CQL, so far I have not seen a good solution for the offset part...

So... one solution I have been contemplating was to create sets of pages using a background process.

In some table, I would create the blog page A as a set of references to page 1, 2, ... 10. Then another entry for blog page B pointing to pages 11 to 20, etc.

In other words, I would build my own index with a row key set to the page number. You may still make it somewhat flexible since you can offer the user to choose to see 10, 20 or 30 references per page. For example, when set to 30, you display sets 1, 2, and 3 as page A, sets 4, 5, 6 as page B, etc.)

And if you have a backend process to handle all of that, you can update your lists as new pages are added and old pages are deleted from the blog. The process should be really fast (like 1 min. for 1,000,000 rows if even that slow...) and then you can find the pages to display in your list pretty much instantaneously. (Obviously, if you are to have thousands of users each posting hundreds of pages... that number can grow quickly.)

Where it becomes more complicated is if you wanted to offer a complex WHERE clause. By default a blog shows you a list of all the posts from the newest to the oldest. You could also offer lists of posts with tag Cassandra. Maybe you want to inverse the order, etc. That makes it difficult unless you have some form of advanced way to create your index(es). On my end I have a C-like language which goes and peek and poke to the values in a row to (a) select them and if selected (b) to sort them. In other words, on my end I can already have WHERE clauses as complex as what you'd have in SQL. However, I do not yet break up my lists in pages. Next step I suppose...

0
votes

Using cassandra-node driver for node js (koa js,marko js) : Pagination Problem

Due to the absence of skip functionality, we need to work around. Below is the implementation of manual paging for node app in case of anyone can get an idea.

  • code for simple users list
  • navigate between next and previous page states
  • easy to replicate

There are two solutions which i am going to state here but only gave the code for solution 1 below,

Solution 1 : Maintain page states for next and previous records (maintain stack or whatever data structure best fit)

Solution 2 : Loop through all records with limit and save all possible page states in variable and generate pages relatively to their pageStates

Using this commented code in model, we can get all states for pages

            //for the next flow
            //if (result.nextPage) {
            // Retrieve the following pages:
            // the same row handler from above will be used
            // result.nextPage();
            //}

Router Functions

    var userModel = require('/models/users');
          public.get('/users', users);
          public.post('/users', filterUsers);

    var users = function* () {//get request
        var data = {};
        var pageState = { "next": "", "previous": "" };
        try {
            var userCount = yield userModel.Count();//count all users with basic count query

            var currentPage = 1;
            var pager = yield generatePaging(currentPage, userCount, pagingMaxLimit);
            var userList = yield userModel.List(pager);
            data.pageNumber = currentPage;
            data.TotalPages = pager.TotalPages;
            console.log('--------------what now--------------');
            data.pageState_next = userList.pageStates.next;
            data.pageState_previous = userList.pageStates.previous;
            console.log("next ", data.pageState_next);
            console.log("previous ", data.pageState_previous);

            data.previousStates = null;

            data.isPrevious = false;
            if ((userCount / pagingMaxLimit) > 1) {
                data.isNext = true;
            }

            data.userList = userList;
            data.totalRecords = userCount;
            console.log('--------------------userList--------------------', data.userList);
            //pass to html template
        }
        catch (e) {
            console.log("err ", e);
            log.info("userList error : ", e);
        }
   this.body = this.stream('./views/userList.marko', data);
   this.type = 'text/html';
    };

    //post filter and get list
    var filterUsers = function* () {
        console.log("<------------------Form Post Started----------------->");
        var data = {};
        var totalCount;
        data.isPrevious = true;
        data.isNext = true;

        var form = this.request.body;
        console.log("----------------formdata--------------------", form);
        var currentPage = parseInt(form.hdpagenumber);//page number hidden in html
        console.log("-------before current page------", currentPage);
        var pageState = null;
        try {
            var statesArray = [];
            if (form.hdallpageStates && form.hdallpageStates !== '') {
                statesArray = form.hdallpageStates.split(',');
            }
            console.log(statesArray);

            //develop stack to track paging states
            if (form.hdpagestateRequest === 'next') {
                console.log('--------------------------next---------------------');
                currentPage = currentPage + 1;
                statesArray.push(form.hdpageState_next);
                pageState = form.hdpageState_next;
            }
            else if (form.hdpagestateRequest === 'previous') {
                console.log('--------------------------pre---------------------');
                currentPage = currentPage - 1;
                var p_st = statesArray.length - 2;//second last index
                console.log('this index of array to be removed ', p_st);
                pageState = statesArray[p_st];
                statesArray.splice(p_st, 1);
                //pageState = statesArray.pop();
            }
            else if (form.hdispaging === 'false') {
                currentPage = 1;
                pageState = null;
                statesArray = [];
            }


            data.previousStates = statesArray;
            console.log("paging true");

            totalCount = yield userModel.Count();

            var pager = yield generatePaging(form.hdpagenumber, totalCount, pagingMaxLimit);
            data.pageNumber = currentPage;
            data.TotalPages = pager.TotalPages;

            //filter function - not yet constructed
            var searchUsers = yield userModel.searchList(pager, pageState);
            data.usersList = searchUsers;
            if (searchUsers.pageStates) {
                data.pageStates = searchUsers.pageStates;
                data.next = searchUsers.nextPage;
                data.pageState_next = searchUsers.pageStates.next;
                data.pageState_previous = searchUsers.pageStates.previous;

                //show previous and next buttons accordingly
                if (currentPage == 1 && pager.TotalPages > 1) {
                    data.isPrevious = false;
                    data.isNext = true;
                }
                else if (currentPage == 1 && pager.TotalPages <= 1) {
                    data.isPrevious = false;
                    data.isNext = false;
                }
                else if (currentPage >= pager.TotalPages) {
                    data.isPrevious = true;
                    data.isNext = false;
                }
                else {
                    data.isPrevious = true;
                    data.isNext = true;
                }
            }
            else {
                data.isPrevious = false;
                data.isNext = false;
            }
            console.log("response ", searchUsers);
            data.totalRecords = totalCount;

           //pass to html template
        }
        catch (e) {
            console.log("err ", e);
            log.info("user list error : ", e);
        }
        console.log("<------------------Form Post Ended----------------->");
   this.body = this.stream('./views/userList.marko', data);
   this.type = 'text/html';
    };

    //Paging function
    var generatePaging = function* (currentpage, count, pageSizeTemp) {
        var paging = new Object();
        var pagesize = pageSizeTemp;
        var totalPages = 0;
        var pageNo = currentpage == null ? null : currentpage;
        var skip = pageNo == null ? 0 : parseInt(pageNo - 1) * pagesize;
        var pageNumber = pageNo != null ? pageNo : 1;
        totalPages = pagesize == null ? 0 : Math.ceil(count / pagesize);
        paging.skip = skip;
        paging.limit = pagesize;
        paging.pageNumber = pageNumber;
        paging.TotalPages = totalPages;
        return paging;
    };

Model Functions

    var clientdb = require('../utils/cassandradb')();
    var Users = function (options) {
      //this.init();
      _.assign(this, options);
    };

    Users.List = function* (limit) {//first time
            var myresult; var res = [];
            res.pageStates = { "next": "", "previous": "" };

            const options = { prepare: true, fetchSize: limit };
            console.log('----------did i appeared first?-----------');

            yield new Promise(function (resolve, reject) {
                clientdb.eachRow('SELECT * FROM users_lookup_history', [], options, function (n, row) {
                    console.log('----paging----rows');
                    res.push(row);
                }, function (err, result) {
                    if (err) {
                        console.log("error ", err);
                    }
                    else {
                        res.pageStates.next = result.pageState;
                        res.nextPage = result.nextPage;//next page function
                    }
                    resolve(result);
                });
            }).catch(function (e) {
                console.log("error ", e);
            }); //promise ends

            console.log('page state ', res.pageStates);
            return res;
        };

        Users.searchList = function* (pager, pageState) {//paging filtering
            console.log("|------------Query Started-------------|");
            console.log("pageState if any ", pageState);
            var res = [], myresult;
            res.pageStates = { "next": "" };
            var query = "SELECT * FROM users_lookup_history ";
            var params = [];

            console.log('current pageState ', pageState);
            const options = { pageState: pageState, prepare: true, fetchSize: pager.limit };
            console.log('----------------did i appeared first?------------------');

            yield new Promise(function (resolve, reject) {
                clientdb.eachRow(query, [], options, function (n, row) {
                    console.log('----Users paging----rows');
                    res.push(row);
                }, function (err, result) {
                    if (err) {
                        console.log("error ", err);
                    }
                    else {
                        res.pageStates.next = result.pageState;
                        res.nextPage = result.nextPage;
                    }
                    //for the next flow
                    //if (result.nextPage) {
                    // Retrieve the following pages:
                    // the same row handler from above will be used
                    // result.nextPage();
                    //}
                    resolve(result);
                });
            }).catch(function (e) {
                console.log("error ", e);
                info.log('something');
            }); //promise ends

            console.log('page state ', pageState);

            console.log("|------------Query Ended-------------|");
            return res;
        };

Html side

        <div class="box-footer clearfix">
        <ul class="pagination pagination-sm no-margin pull-left">
             <if test="data.isPrevious == true">
             <li><a class='submitform_previous' href="">Previous</a></li>
             </if>
             <if test="data.isNext == true">
                <li><a class="submitform_next" href="">Next</a></li>
             </if>
         </ul>
         <ul class="pagination pagination-sm no-margin pull-right">
                    <li>Total Records : $data.totalRecords</li>&nbsp;&nbsp;
                    <li> | Total Pages : $data.TotalPages</li>&nbsp;&nbsp;
                    <li> | Current Page : $data.pageNumber</li>&nbsp;&nbsp;
         </ul>
         </div>

I am not very much experienced with node js and cassandra db, this solution can surely be improved. Solution 1 is working example code to start with the paging idea. Cheers