So I want to write data, which is coded as a JSON string into a Cassandra table. I did the following steps:
- Create a Cassandra table containing columns with all the attributes of my JSON string. Here is the cql for that:
CREATE TABLE on_equipment (
ChnID varchar,
StgID varchar,
EquipID varchar,
SenID varchar,
value1 float,
value2 float,
value3 float,
electric_consumption float,
timestamp float,
measurement_location varchar,
PRIMARY KEY ((timestamp))
) WITH comment = 'A table for the on equipment readings';
- Write a python Cassandra client to write the data into Cassandra from a JSON payload. Here is the code snippet to make the INSERt query (msg.value is the json string):
session.execute('INSERT INTO ' + table_name + ' JSON ' + "'" + msg.value + "';")
I get no writing errors when doing this.
However, I ran into a problem:
The JSON data I have is from IoT sources, and one of the attributed I have is a unix timestamp. An example of a JSON record is as follows (notice the timestamp attribute):
{'timestamp': 1598279069.441547, 'value1': 0.36809349674042857, 'value2': 18.284579388599308, 'value3': 39.95615809003724, 'electric_consumption': 1.2468644044844224, 'SenID': '1', 'EquipID': 'MID-1', 'StgID': '1', 'ChnID': '1', 'measurement_location': 'OnEquipment'}
In order to insert many records, I have defined the timestamp value as the primary key of the data in the Cassandra table. The problem is that not all records are being written into Cassandra, only records who's timestamps fall into a certain group. I know this because I have produced around 100 messages and received zero write errors, yet the contents of the table only has 4 rows:
timestamp | chnid | electric_consumption | equipid | measurement_location | senid | stgid | value1 | value2 | value3
------------+-------+----------------------+---------+----------------------+-------+-------+----------+----------+----------
1.5983e+09 | 1 | 0.149826 | MID-1 | OnEquipment | 1 | 1 | 0.702309 | 19.92813 | 21.47207
1.5983e+09 | 1 | 1.10219 | MID-1 | OnEquipment | 1 | 1 | 0.141921 | 5.11319 | 78.17094
1.5983e+09 | 1 | 1.24686 | MID-1 | OnEquipment | 1 | 1 | 0.368093 | 18.28458 | 39.95616
1.5983e+09 | 1 | 1.22841 | MID-1 | OnEquipment | 1 | 1 | 0.318357 | 16.9013 | 71.5506
In other words, Cassandra is updating the values of these four rows, when it should be writing all the 100 messages.
My guess is that I incorrectly using the Cassandra primary key. The timestamp column is type float.
My questions: Does this behaviour make sense? Can you explain it? What can I use as the primary key to solve this? Is there a way to make the primary key a Cassandra writing or arrival time?
Thank you in advance for your help!