Does the latest cassandra still support the individual rows to have different columns?

Question

I am new to cassandra, I was looking at the offical document. I can find that the table concept in cassandra is very similar to the RDBMS.

The https://cassandra.apache.org/doc/latest/cql/index.html will teach me how to create table, insert table, etc.

But below is from https://www.tutorialspoint.com/cassandra/cassandra_data_model.htm.

It said Unlike relational tables where a column family’s schema is not fixed, Cassandra does not force individual rows to have all the columns. The following figure shows an example of a Cassandra column family.

My question I can not find this design in current cassandra, below is the screenshot I run some simple insert command.

Since I only insert two columns INSERT INTO emp (emp_id, emp_city) VALUES (5, 'abc'), so the rest will be set null, it's very similar to general rdbms.

So could you tell me how can I implment the 'different rows have different columns' in the first picture? Thanks very much.

Alex Ott Alex Ott · Accepted Answer · 2020-07-09T09:20:45

Cassandra doesn't insert null when you omit data for specific column. The null is returned when you read the data and data is missing. It's the best to check how the data is laying on the disk using the sstabledump. For example, for my data:

cqlsh:test> select * from test.st1;

 id | c1   | s1 | v1
----+------+----+------
 10 | null | 10 | null
  1 |    1 |  2 |    1
  1 |    2 |  2 |    1
  2 |   10 |  3 | null

(4 rows)

for last row I can see that I don't have actual data, because cells is empty:

  {
    "partition" : {
      "key" : [ "2" ],
      "position" : 97
    },
    "rows" : [
      {
        "type" : "static_block",
        "position" : 144,
        "cells" : [
          { "name" : "s1", "value" : 3, "tstamp" : "2019-04-12T14:33:47.198445Z" }
        ]
      },
      {
        "type" : "row",
        "position" : 144,
        "clustering" : [ 10 ],
        "liveness_info" : { "tstamp" : "2019-04-29T12:49:31.450239Z" },
        "cells" : [ ]
      }
    ]
  }

but if I insert null explicitly:

cqlsh:test> insert into test.st1(id, s1, c1, v1) values (3, 10, 3, null);

then I will see it in the data file as tombstone inside the cells:

  {
    "partition" : {
      "key" : [ "3" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "static_block",
        "position" : 39,
        "cells" : [
          { "name" : "s1", "value" : 10, "tstamp" : "2020-07-09T09:19:39.751467Z" }
        ]
      },
      {
        "type" : "row",
        "position" : 39,
        "clustering" : [ 3 ],
        "liveness_info" : { "tstamp" : "2020-07-09T09:19:39.751467Z" },
        "cells" : [
          { "name" : "v1", "deletion_info" : { "local_delete_time" : "2020-07-09T09:19:39Z" }
          }
        ]
      }
    ]
  }

Does the latest cassandra still support the individual rows to have different columns?

1 Answers