6
votes

We want to use cassandra to store complex data
but we can't figure out how to organize indexes.

Our table (column family) looks like this:

Users =
  { 
    RandomId int,
    Firstname varchar,
    Lastname varchar,
    Age int,
    Country int,
    ChildCount int
  }

We have queries with mandatory fields (Firstname, Lastname, Age) and extra search options (Country, ChildCount).
How should we organize the index to make this kind of queries faster?

First I thought, it would be natural to make composite index on (Firstname, Lastname, Age) and add separate secondary index on remaining fields (Country and ChildCount).
But I can't insert rows into table after creating secondary indexes and I can't query the table.

Using

  • cassandra 1.1.0
  • cqlsh with --cql3 option.

Any other suggestions to solve our problem (complex queries with mandatory and additional options) are welcome.

2

2 Answers

2
votes

This is my idea. You could simply create a column family with your RandomId as the row key and all the remaining fields simply as columns (e.g. column name 'firstname', column value 'jonh'). After this you have to create a secondary index for each of these columns. The cardinality of your values seems to be low so it should be slightly efficient.

THe CQL code should be something like:

create column family users with comparator=UTF8Type and column_metadata=[{column_name:  firstname, validation_class: UTF8Type,index_type: KEYS},
{column_name: lastname, validation_class: UTF8Type, index_type: KEYS},
{column_name: contry, validation_class: IntegerType, index_type: KEYS},
{column_name: age, validation_class: IntegerType, index_type: KEYS]},
{column_name: ChildCount, validation_class: IntegerType, index_type: KEYS]];

A good reference for it could be http://www.datastax.com/docs/0.7/data_model/secondary_indexes

Let me know if I'm wrong;

1
votes

For queries involving a large number of partitions indices are not very efficient.

I think it is better to think the tables based on the queries you'd want to make: you want a table for queries based on user name and that seems like the right place to store all the info concerning the user. On the other hand you want to be able to search based on country, I assumed, to provide a list of users: for that you don't really need all the info, maybe just the first and last names, or just the email, etc. Another table could do it then.

This involves some data duplication but that better fits the Cassandra data modelling ideas.

This would give:

CREATE TABLE users(
   id UUID,
   lastname TEXT,
   firstname TEXT,
   age INT,
   country TEXT,
   childcount INT,
   PRIMARY KEY(UUID)
);

CREATE TABLE users_by_country(
   country TEXT,
   firstname TEXT,
   lastname TEXT,
   user_uuid UUID,
   PRIMARY KEY((country), firstname, lastname)
);

CREATE TABLE users_by_age(
   age INT,
   firstname TEXT,
   lastname TEXT,
   user_uuid UUID,
   PRIMARY KEY((age), firstname, lastname)
);