0
votes

I am trying to create a mental model of data model of Cassandra. What I have got so far is that the basic unit of data is a column (name, value, timestamp). A super-column can contain several columns (it has name and its value is a map). An example of ColumnFamily (which I suppose contains several entries of data or rows) is

UserProfile = { // this is a ColumnFamily
phatduckk: {   // this is the key to this Row inside the CF
        username: "phatduckk", //column
        email: "[email protected]", //column
        phone: "(900) 976-6666"//column
    }, // end row
ieure: {   // another row in same CF. this is the key to another row in the CF

        username: "ieure",
        email: "[email protected]",
        phone: "(888) 555-1212"
        age: "66", // a differnet column than previous one.
        gender: "undecided" // a differnet column than previous one.
    },
}

Question 1- To me it seems that a row in CF is nothing but a key-value pair where value is a super-column Am I correct?

Question 2- Could the value (of row key) be a map of several super columns?What I am thinking is say I want to create a row with User's name and address then the row could be key (user id) and value maps to two super columns, C1 (firstname, last name) and C2 (street, country)

1

1 Answers

1
votes

I think your trying to wrap head around the (very) old nomenclature that was renamed to make it less confusing.

Table

{
  partition key: {          // partition
    clustering: {           // row
       key: value           // column
       key2: value          // column
       key3: value          // column
    }
    clustering2: {          // row
       key: value           // column
       ...
    }
    ...
  }
  ...
}

partitions are ordered by the murmur3 hash of the key and used to determine which hosts are replicas. The clustering keys are sorted within them, and theres a fixed schema for the fields within a row which each has a column.

Using super column family, column family, supercolumns, columns and row nomenclature is just going to get yourself confused when you read anything thats come out in last 6 years. Thrift has even been deprecated as well for what its worth so don't plan your application around that.

For your questions

Question 1- To me it seems that a row in CF is nothing but a key-value pair where value is a super-column Am I correct?

yes, but the super columns are sorted by their keys. ie phatduckk would be after ieure if they are text types using descending order. That way you can read a slice of names between ph and pk for instance and pull them off disk (more useful when clustering on a timestamp and looking for ranges of data).

Question 2- Could the value (of row key) be a map of several super columns?What I am thinking is say I want to create a row with User's name and address then the row could be key (user id) and value maps to two super columns, C1 (firstname, last name) and C2 (street, country)

You should really look at some newer documentation. I think you have right idea but hard to relate exactly with how C* works now. Try starting with

https://academy.datastax.com/resources/ds101-introduction-cassandra https://academy.datastax.com/resources/ds220-data-modeling

as some free courses that do a good job explaining.