1
votes

I have a table MACRecord in Cassandra as follows :

CREATE TABLE has.macrecord (
    macadd text PRIMARY KEY,
    position int,
    record int,
    rssi1 float,
    rssi2 float,
    rssi3 float,
    rssi4 float,
    rssi5 float,
    timestamp timestamp
)

I have 5 different nodes each updating a row based on its title i-e node 1 just updates rssi1, node 2 just updates rssi2 etc. This evidently creates null values for other columns.

I cannot seem to be able to a find a query which will give me only those rows which are not null. Specifically i have referred to this post.

I want to be able to query for example like SELECT *FROM MACRecord where RSSI1 != NULL as in MYSQL. However it seems both null values and comparison operators such as != are not supported in CQL.

Is there an alternative to putting NULL values or a special flag?. I am inserting float so unlike strings i cannot insert something like ''. What is a possible workaround for this problem?

Edit :

My data model in MYSQL was like this :

+-----------+--------------+------+-----+-------------------+-----------------------------+
| Field     | Type         | Null | Key | Default           | Extra                       |
+-----------+--------------+------+-----+-------------------+-----------------------------+
| MACAdd    | varchar(17)  | YES  | UNI | NULL              |                             |
| Timestamp | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| Record    | smallint(6)  | YES  |     | NULL              |                             |
| RSSI1     | decimal(5,2) | YES  |     | NULL              |                             |
| RSSI2     | decimal(5,2) | YES  |     | NULL              |                             |
| RSSI3     | decimal(5,2) | YES  |     | NULL              |                             |
| RSSI4     | decimal(5,2) | YES  |     | NULL              |                             |
| RSSI5     | decimal(5,2) | YES  |     | NULL              |                             |
| Position  | smallint(6)  | YES  |     | NULL              |                             |
+-----------+--------------+------+-----+-------------------+-----------------------------+

Each node (1-5) was querying from MYSQL based on its number for example node 1 "SELECT *FROM MACRecord WHERE RSSI1 is not NULL"

I updated my data model in cassandra as follows so that rssi1-rssi5 are now VARCHAR types.

CREATE TABLE has.macrecord (
    macadd text PRIMARY KEY,
    position int,
    record int,
    rssi1 text,
    rssi2 text,
    rssi3 text,
    rssi4 text,
    rssi5 text,
    timestamp timestamp
)

I was thinking that each node would initially insert string 'NULL' for a record and when an actual rssi data comes it will just replace the 'NULL' string so it would avoid having tombstones and would more or less appear to the user that the values are actually not valid pieces of data since they are flagged 'NULL'.

However i am still puzzled as to how i will retrieve results like i have done in MYSQL. There is no != operator in cassandra. How can i write a query which will give me a result set for example like "SELECT *FROM HAS.MACRecord where RSSI1 != 'NULL'" .

1
How are you planning on this to work.? You can't query on fields which are not in the key. You might have to rethink your data model.jny

1 Answers

2
votes

You can only select rows in CQL based on the PRIMARY KEY fields, which by definition cannot be null. This also applies to secondary indexes. So I don't think Cassandra will be able to do the filtering you want on the data fields. You could select on some other criteria and then write your client to ignore rows that had null values.

Or you could create a different table for each rssiX value, so that none of them would be null.

If you are only interested in some kind of aggregation, then the null values are treated as zero. So you could do something like this:

SELECT sum(rssi1) WHERE macadd='someadd';

The sum() function is available in Cassandra 2.2.

You might also be able to do some kind of trick with a user defined function/aggregate, but I think it would be simpler to have multiple tables.