4
votes
CREATE TABLE users (
 userId uuid,
 firstname varchar,
 mobileNo varchar,
 PRIMARY KEY (userId)
);
CREATE TABLE users_by_firstname (
 userId uuid,
 firstname varchar,
 mobileNo varchar,
 PRIMARY KEY (firstname,userId)
);

I have 100 rows in these tables. I want to get randomly selected 10 rows each time.

In MySQL

select * from users order by RAND() limit 10;

In Cassandra

select * from users limit 10;
select * from users_by_firstname limit 10;

But from 1st table I would get the static 10 rows sorted by the generated hash of the partition key (userId).

From the second one I would get the static 10 rows sorted by userId. But it will not be random if the data does not change.

Is there any way to get random rows each time in Cassandra.

Thanks
Chaity

1

1 Answers

13
votes

It's not possible to archive this directly. There are possibilities to emulate this (this solution is not really random, but you should receive different values), but it's not really a perfect idea.

What you could do is, create a random value in the cassandra token range -2^63 - 2^64. With this random value you can perform such a query:

select * from users_by_firstname where token(userId) > #generated_value# limit 10;

Using this method you can define a random 'starting point' from where you can receive 10 users. As I said, this method is not perfect and it certainly needs some thoughts on how to generate the random token. An edge case could be, that your random value is so far on one side of the ring, that you would receive less than 10 values.

Here is a short example:

Lets say you have a users table with the following users:

 token(uuid)          | name
----------------------+---------
 -2540966642987085542 |    Kate
 -1621523823236117896 | Pauline
 -1297921881139976049 |  Stefan
  -663977588974966463 |    Anna
  -155496620801056360 |    Hans
   958005880272148645 |     Max
  3561637668096805189 |    Doro
  5293579765126103566 |    Paul
  8061178154297884044 |   Frank
  8213365047359667313 |   Peter

Lets now say you generate the value 42 as a start-token, the select would be

select token(uuid), name from test where token(uuid) > 42 limit 10;

In this example the result would be

 token(id)           | name
---------------------+-------
  958005880272148645 |   Max
 3561637668096805189 |  Doro
 5293579765126103566 |  Paul
 8061178154297884044 | Frank
 8213365047359667313 | Peter

This method might be a reasonable approach if you have a lot of data, and a balanced cluster. To make sure you don't run into these edge case you could limit the range to not come near the edges of the cassandra token range.