Cassandra Table Modeling

Question

Imagine a table with thousands of columns, where most data in the row record is null. One of the columns is an ID, and this ID is known upfront.

select id,SomeRandomColumn 
from LotsOfColumnsTable
where id = 92e72b9e-7507-4c83-9207-c357df57b318;

SomeRandomColumn is one of thousands, and in most cases the only column with data. SomeRandomColumn is NOT known upfront as the one that contains data.

Is there a CQL query that can do something like this.

select {Only Columns with data}  
from LotsOfColumnsTable  
where id = 92e72b9e-7507-4c83-9207-c357df57b318;

I was thinking of putting in a "hint" column that points to the column with data, but that feels wrong unless there is a CQL query that looks something like this with one query;
```
select ColumnHint.{DataColumnName}
from LotsOfColumnsTable  
where id = 92e72b9e-7507-4c83-9207-c357df57b318;
```

In MongoDB I would just have a collection and the document I got back would have a "Type" attribute describing the data. So perhaps my real question is how do I replicate what I can do with MondoDB in Cassandra. My Cassandra journey so far is to create UDT's for each unique document, followed by altering the table to add this new UDT as a column. My starter table looks like this where ColumnDataName is the hint;

CREATE TABLE IF NOT EXISTS WideProductInstance (
   Id uuid,
   ColumnDataName text
   PRIMARY KEY (Id)
);

Thanks

Cassandra hasn't the concept of null values. If you have an empty column in a row, it's empty. You can perform SELECT *. Cassandra will return only the existing data. I don't understand why you need more? — Citrullin

doanduyhai doanduyhai · Accepted Answer · 2016-04-18T18:39:01

Is there a CQL query that can do something like this.

select {Only Columns with data}
from LotsOfColumnsTable
where id = 92e72b9e-7507-4c83-9207-c357df57b318;

No, you cannot do that. And it's pretty easy to explain. To be able to know that a column contains data, Cassandra will need to read it. And if it has to read the data, since the effort is already spent on disk, it will just return this data to the client.

The only saving you'll get if Cassandra was capable of filtering out null column is on the network bandwidth ...

I was thinking of putting in a "hint" column that points to the column with data, but that feels wrong unless there is a CQL query that looks something like this with one query;

Your idea is like storing in another table a list of all column that actually contains real data and not null. It sounds like a JOIN which is bad and not supported. And if you need to read this reference table before reading the original table, you'll have to read at many places and it's going to be expensive

So perhaps my real question is how do I replicate what I can do with MondoDB in Cassandra.

Don't try to replicate the same feature from Mongo to Cassandra. The two database have fundamentally different architecture. What you have to do is to reason about your functional use-case. "How do I want to fetch my data from Cassandra ?" and from this point design a proper data model. Cassandra data model is designed by query.

The best advice for you is to watch some Cassandra Data Model videos (it's free) at http://academy.datastax.com

Cassandra Table Modeling

1 Answers