0
votes

I have read the following article about Cassandra CQL3 and Thrift API http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows

In the article, they give an example on creating a scheme for gathering data from sensors. They show a “wide row” solution by making the timestamp as a column. Cassandra's strength, as I see it is by supporting 2 billion columns and a fast way to extract data according to column.

In the article, with CQL3 they build a table

CREATE TABLE data (
  sensor_id int,
  collected_at timestamp,
  volts float,
  PRIMARY KEY (sensor_id, collected_at)
) WITH COMPACT STORAGE;

which translates to:

 sensor_id | collected_at             | volts
         1 | 2013-06-05 15:11:00-0500 |   3.1
         1 | 2013-06-05 15:11:10-0500 |   4.3
         1 | 2013-06-05 15:11:20-0500 |   5.7
         2 | 2013-06-05 15:11:00-0500 |   3.2
         3 | 2013-06-05 15:11:00-0500 |   3.3
         3 | 2013-06-05 15:11:10-0500 |   4.3

In Thrift it translates to:

list data;
RowKey: 1

=> (cell=2013-06-05 15:11:00-0500, value=3.1, timestamp=1370463146717000)
=> (cell=2013-06-05 15:11:10-0500, value=4.3, timestamp=1370463282090000)
=> (cell=2013-06-05 15:11:20-0500, value=5.7, timestamp=1370463282093000)

RowKey: 2

=> (cell=2013-06-05 15:11:00-0500, value=3.2, timestamp=1370463332361000)

RowKey: 3

=> (cell=2013-06-05 15:11:00-0500, value=3.3, timestamp=1370463332365000)
=> (cell=2013-06-05 15:11:10-0500, value=4.3, timestamp=1370463332368000)

I'm trying to think of a Cassandra Schema example for the following sensor data gathering problem. Let's say I add a new set of sensors which have a bigint (long) value (instead of float).

Any ideas how to design such a table schema to include both sensor types with different data types yet keep the columns based on timestamp?

Thanks, Guy

1

1 Answers

3
votes

If you don't need to use COMPACT STORAGE (and backwards compatibility with Thrift), just create your table as

CREATE TABLE data (
  sensor_id int,
  collected_at timestamp,
  other_field bigint,
  volts float,
  PRIMARY KEY (sensor_id, collected_at)
)

Cassandra supports sparse columns with basically no overhead, so if you programmatically decide to only populate one of the two fields for any cql row, you will achieve your goal.

Alternatively, you can continue to use COMPACT STORAGE and just switch to blob type. The blob type will do absolutely no interpretation or transformation of the bytes that you insert into it, so accuracy can be guaranteed. I would not recommend using a text type for this.