3
votes

I'm currently trying to migrate a project from Cassandra's Thrift interface (which I've grown to love) to CQL. I have the following "issue" that is bothering me. I create a events table:

CREATE TABLE events (
  eventKey uuid PRIMARY KEY,
  type decimal,
  severity decimal,
  source inet
);

When I stick to CQL, everything looks great. For a better understanding I had a look at it in CLI though and found that for every entry (thrift row), I have one empty column:

[default@test] list events;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: a9ddffba-3c30-4119-add8-966dddb38490
=> (name=, value=, timestamp=1396364167269000)
=> (name=type, value=0000000001fe, timestamp=1396364167269000)
=> (name=severity, value=00000000ff, timestamp=1396364167269000)
=> (name=source, value=0000000000000065, timestamp=1396364167269000)
-------------------
RowKey: a9ddffba-3c30-4119-add8-966aaab384a0
=> (name=, value=, timestamp=1396363462812000)
=> (name= source, value=0000000000000065, timestamp=1396363462812000)
-------------------
RowKey: a9ddffba-3c30-4119-add8-966aaab38490
=> (name=, value=, timestamp=1396364010098000)
=> (name= source, value=0000000000000066, timestamp=1396364010098000)

As far as I understand, this is due to the fact that I'm not using COMPACT MODE and hence Cassandra creates a column as sort of index for the column name. That makes perfect sense if I were using wide rows, i.e. if my primary key would look like (eventKey, type, severity) for example. However, I don't see how I need that in this situation and while it's just an empty column, it still generates some additional, yet (I think) unnecessary data volume. Any thoughts on that? Am I missing something?

If I use compact storage, I don't have the column created. But then, I can't really change the schema, since "Bad Request: Cannot drop columns from a COMPACT STORAGE table" -- can someone explain me that too? Especially since it's possible from within the CLI and I keep reading how in CQL you can do everything you can do in thrift and even more.

1

1 Answers

2
votes

Those "empty" columns are "CQL row" markers, which Cassandra uses internally. And yes, those require additional storage, but they give you a flexibility to alter your schema later.

With COMPACT STORAGE there will be no storage overhead, but your table's schema will be set in stone.

There is a good explanation available at http://www.datastax.com/dev/blog/thrift-to-cql3