0
votes

I have a question regarding the Cassandra data modelling while choosing the strategy.

I need to store the employee data that consists of attributes like id, name, email, date of birth, designation but these attributes will grow in future.

The partition key is going to be id.

There are two ways to design the table

  1. Create employee table:

    CREATE TABLE employee_horizontal (
        id text,
        name text,
        email text,
        dob DATE,
        designation text,
        PRIMARY KEY (id)
    ) ;
    

    But here I have the limitation that I can query using id

  2. Create employee table:

    CREATE TABLE employee_kv (
        id,
        key text, (predefined set of attributes name, email,dob, designation)
        value text,
        PRIMARY KEY (id,key)
    );
    

    The above table will give leverage to do a lookup based on id and key (if that key exists for employees).

Could someone please suggest some pros/cons for the above strategy?

1

1 Answers

1
votes

Cassandra modeling advocates denormalization of data. Also modeling principle of Cassandra says design your tables based on your queries. So write down your queries first. For example if your queries could be like:

  1. Get employee by Id.
  2. Get employee by email Id.
  3. Get employee by designation.

Now to create a data model for your query, you have to create following tables

For Query 1,

    CREATE TABLE employee_by_id (
    id text,
    name text,
    email text,
    dob DATE,
    designation text,
    PRIMARY KEY (id)) ;

For Query 2,

   CREATE TABLE employee_by_email (
    id text,
    name text,
    email text,
    dob DATE,
    designation text,
    PRIMARY KEY (email)) ;

For Query 3,

    CREATE TABLE employee_by_designation (
    id text,
    name text,
    email text,
    dob DATE,
    designation text,
    PRIMARY KEY ((designation), name)) ;

This is just an example to show how to model your data for Cassandra. You cannot have a single table responding all your queries.