confusion on cassandra data modeling

Question

I've spend the last couple of days browsing online articles, videos, and even stacks posts to understand how to model data in cassandra. I understand that one needs to model the data according to query patterns but what I don't understand is the column family and column relationship in cassandra and if that applies to they way I want to query data.

I have a relational database table that consists of the following

CUST_ID | ACCT_ID | CUST_ADDRS | ACCT_ADDRS | CUST_ST | ACCT_ST | CUST_FRAUD_IND | ACCT_DAYS_OPEN | ACCT_TYPE | CUST_CARD_IND | etc...

essentially its a table with customer IDs and their account IDs so the unique key would be cust_id+acct_id. Each customer can have many 1 or more accounts. There are attributes based on customerID like addrs, state, if the customer has a card, etc. And there are attributes based on accounts like address, state, type of account, etc...

Some of the queries we would run would be tell me if a specific customer (CUST_ID = xxxx) has any accounts that are a card account (ACCT_TYPE = 'CARD'). Or if a customer has any accounts open longer than 180 days.

I've looked at this link:

http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/#.VH-OezHF_6M

And I'm curious about option 4 as it looks like what should be building. So in my case my table would have a key of CUST_ID and then a super column called "Card Account", "Checking Account", etc.. that contain all the attributes of those columns.

My question is now, is that the right option, and if so, how would I build that table in cassandra? And then, how do I load data into a table that has super columns?

The article you are referring to is extremely out of date with regards to current best practices. Thrift ColumnFamilies are no longer first class citizens and the CQL interface is preferred. — RussS

Mahendra Singh Mahendra Singh · Accepted Answer · 2015-08-06T06:35:19

As you read Cassandra Data moldel must be according to query pattern . But you are not following that pattern . You have to create many table according to your query . Don't worry about data redundancy , Cassandra will handle it .

Structure of cassandra is following :-

         Map<Rowkeys, SortedMap<ClusteringKeys ,OtherColumns>>

Like one table is here

create table temp ( id1 int , id2 text , id3 int , id4 text, id5 int id6 text, primary key((id1,id2) , id3, id4) );

Then Cassandra Rows will be according to :-

id1,id2

and columns of one table will be according to :-

id3, id4.

.So make your data model according to query .

if you want to see how is cassandra store data then open your bin/cassandra-cli then use any keyspace then run command:- list table_name

it will give output with showing how many rows is there and how may column for per row .

confusion on cassandra data modeling

1 Answers