Best Indexing model in Cassandra table

Question

``(I've read A Big Data Modeling Methodology for Apache Cassandra for data modeling for my project database, which uses Cassandra. So, I use Query-Driven methodology.)

I will have a search customers as below: (This is just an example, The real page has more search parameters. Also, none of the search parameters are required-parameter.)

The sample Customers table in my Cassandra key-space: (The primary-key is selected according to mentioned article)

//---------Create Customers Table
USE testKeySpace;
CREATE TABLE IF NOT EXISTS customers(
id varint
name text
birthday date,
gender text,
education text,
PRIMARY KEY ((id,name,gender,education),birthday)
);

Questions are:

What 's the best Indexing model for this table?
How can I write a query to support optional search parameters?

In case Cassandra is actually a good fit for this task, then you should have N tables, one for each query type. Cassandra 3.0 supports materialised views which would make that easier to support. — Cedric H.
Cassandra is not the best fit for your requirements. You will have to use Solr OR elasticsearch for indexing on top of cassandra. — undefined_variable
So, according to my several search options, it's not recommended to use individual tables per query; I will have 'n!' queries! is that true? — Elnaz

bechbd bechbd · Accepted Answer · 2016-02-29T14:03:37

So based on what you have said here I think that there may be a few issues with your current table design when it comes to indexing and the ability to search. In your post you mention that none of the search parameters are required but with the table schema you currently have you will need to provide id, name, gender, and education in the WHERE clause of each request. This is due to the fact that CQL requires you to either have no WHERE clause or that the WHERE clause must contain all the PARTITION KEY fields in it. Since you have a query which is not well defined (i.e. dynamic) ahead of time you will not want to go down the table-per-query route as you do not know what the queries may be.

In this case my suggestion is that you create a table which has a PRIMARY KEY based on the most effective way to partition your data. I would then suggest you use the suggestion from Undefined_variable and setup either Solr or Elasticsearch on top of your data to provide the dynamic and full text search capabilities you are looking for. If you decide to go the Solr route you will want to look at using Datastax (Community (free) or Enterprise (paid)) as it has a nice integration with Solr (called DSE Search) out of the box. With DSE Search you can use CQL to query your Solr index like SELECT * FROM customers WHERE solr_query='name: XXXX'. You can look here to see more information:

http://docs.datastax.com/en/latest-dse/datastax_enterprise/srch/srchCql.html

Best Indexing model in Cassandra table

2 Answers