0
votes

I need a suggestion on designing a Cassandra table schema. I have a created a table like this:

CREATE TABLE sams.events (
    addedtime timestamp,
    hostname text,
    appname text,
    eventtime timestamp,
    PRIMARY KEY (addedtime, hostname)
) WITH CLUSTERING ORDER BY (hostname ASC)

Now these are my requirements:

1) I should be able to make range queries via addedtime, like from x date to y date

2) I should be able to query by appname and order the rows in ascending order using addedtime

How can I achieve this? I am ok to change the table schema.

Adding I have created Cassandra cluster of 2 DC and 3 nodes each.

2

2 Answers

1
votes

You mentioned you have only 2 apps. How many hostname you have, is it equal to or greater than number of nodes in the cluster ? If yes, then you can try the following which can give you even spread of data.

CREATE TABLE mykeyspace.events (
appname text,
hostname text,
addedtime timeuuid,
eventtime timeuuid,
PRIMARY KEY ((appname, hostname), addedtime)
);

insert into events (appname, hostname , addedtime , eventtime ) values ('app1','host1',now(), now());
insert into events (appname, hostname , addedtime , eventtime ) values ('app1','host1',now(), now());
insert into events (appname, hostname , addedtime , eventtime ) values ('app1','host2',now(), now());
insert into events (appname, hostname , addedtime , eventtime ) values ('app1','host3',now(), now());
insert into events (appname, hostname , addedtime , eventtime ) values ('app1','host4',now(), now());

Query1: Range query by added time (hoping your number of hostname is not a higher number, otherwise it will be big in clause)

select * from events where appname = 'app1' and hostname in ('host1','host2') and addedtime > maxTimeuuid('2015-08-23 00:46:00-0500') and addedtime < minTimeuuid('2015-08-23 00:49:19-0500') ;

Query 2: By App name (again hoping your number of hostname is not a higher number)

select appname,hostname,dateOf(addedtime) from events where appname = 'app1' and hostname in ('host1','host2');

NOTE: IN clause does not support ordering of data using a query.

0
votes

You'll probably need two tables to handle those queries.

To do a range query on addedtime, you'll need a table where addedtime is a clustering column rather than as a partition key as you show in your example. That's because you can only do range queries on clustering columns. So if you want to do the range query for each hostname, then you could have hostname as the partition key, like this:

CREATE TABLE sams.events_by_hostname (
    addedtime timestamp,
    hostname text,
    appname text,
    eventtime timestamp,
    PRIMARY KEY (hostname, addedtime));

SELECT * FROM events_by_hostname WHERE hostname='host1' 
    AND addedtime >= '2015-08-22 14:00:39-0400' AND addedtime < '2015-08-22 14:00:41-0400';

 hostname | addedtime                | appname | eventtime
----------+--------------------------+---------+--------------------------
    host1 | 2015-08-22 14:00:39-0400 |    app1 | 2015-08-22 14:00:39-0400

To query by appname, you'll want a table with appname as the partition key and addedtime as the clustering column, like this:

CREATE TABLE sams.events_by_appname (
    addedtime timestamp,
    hostname text,
    appname text,
    eventtime timestamp,
    PRIMARY KEY (appname, addedtime));

SELECT * from events_by_appname WHERE appname='app1';

 appname | addedtime                | eventtime                | hostname
---------+--------------------------+--------------------------+----------
    app1 | 2015-08-22 14:06:45-0400 | 2015-08-22 14:06:45-0400 |    host1
    app1 | 2015-08-22 14:09:33-0400 | 2015-08-22 14:09:33-0400 |    host1

To keep the two tables consistent, you can do your inserts using a batch, like this:

BEGIN BATCH 
    INSERT INTO events_by_hostname (hostname, addedtime, appname, eventtime )     
        VALUES ( 'host1', dateof(now()), 'app2', dateof(now()));
    INSERT INTO events_by_appname (hostname, addedtime, appname, eventtime ) 
        VALUES ( 'host1', dateof(now()), 'app2', dateof(now()));
APPLY BATCH;

In Cassandra 3.0, which will soon be in beta release, you'll be able to create the second table as a view of the first table. Then when you modify the first table, Cassandra will update the second table automatically.