0
votes

I am taking performance benchmarks to consider cassandra as a DB solution to our project. I have created a table with 28 columns with couple of columns as primary key.

I loaded tables with data around 250+ millions of records.

When I hit the queries with primary key columns in where clause, the results were very much satisfactory. When I parallelized queries in 5 threads, I could complete close to 1 million queries in 2.5 minutes.

However, when I tried queries with non-primary key columns in where clause, 1000 queries took almost 2 hours.

I knew that, not having primary key is big disadvantage, still we might have that kind of situation some where down the line.

  1. I tried to see if I can use secondary indexes, but they seems to be restricted to one column only.

  2. I could not find right example for custom indexes as it needs index type class.

  3. If I used all columns in the primary key, would that be helpful at least by 5%.?

  4. Is cassandra really a good solution if we expect more query situations without primary key columns in where clause?

I strongly believe somebody might have definitely faced this situation, so it would be great if any one can share their experience.

1
Can you please update the question with exact column family schema and query that doesn't satisfies you.Jaya Ananthram
Hi Jaya, I do not have any additional parameters in my table creation. It is just normal table, with couple of columns out of 28 as Primary key and one other column used for ordering. it is like create table with all columns + PRIMARY KEY (("col1", "col6"),"col10")Srini

1 Answers

3
votes

Is cassandra really a good solution if we expect more query situations without primary key columns in where clause?

This is a use case where a priori Cassandra isn't the best solution. But if you have 250+ millions records, other databases will also meet performance issues.

One solution is to build your own indexes in other tables. If you don't have too much different type of where clause it should do the trick. Even if you will have to do several update or select command to update or select a single row, each of these commands should be as fast as the bench you did.