0
votes

I have a single structured row as input with write rate of 10K per seconds. Each row has 20 columns. Some queries should be answered on these inputs. Because most of the queries needs different WHERE, GROUP BY or ORDER BY, The final data model ended up like this:

primary key for table of query1 : ((column1,column2),column3,column4)
primary key for table of query2 : ((column3,column4),column2,column1)
  and so on

I am aware of the limit in number of tables in Cassandra data model (200 is warning and 500 would fail)

Because for every input row I should do an insert in every table, the final write per seconds became big * big data!:

writes per seconds = 10K (input)
                        * number of tables (queries)
                        * replication factor

The main question: am I on the right path? Is it normal to have a table for every query even when the input rate is already so high?

Shouldn't I use something like spark or hadoop instead of relying on bare datamodel? Or event Hbase instead of Cassandra?

1
Are you using plain Cassandra or DSE? - Alex Ott
plain cassandra - f.ald

1 Answers

0
votes

It could be that Elassandra would resolve your problem.

The query system is quite different from CQL, but the duplication for indexing would automatically be managed by Elassandra on the backend. All the columns of one table will be indexed so the Elasticsearch part of Elassandra can be used with the REST API to query anything you'd like.

In one of my tests, I pushed a huge amount of data to an Elassandra database (8Gb) going non-stop and I never timed out. Also the search engine remained ready pretty much the whole time. More or less what you are talking about. The docs says that it takes 5 to 10 seconds for newly added data to become available in the Elassandra indexes. I guess it will somewhat depend on your installation, but I think that's more than enough speed for most applications.

The use of Elassandra may sound a bit hairy at first, but once in place, it's incredible how fast you can find results. It includes incredible (powerful) WHERE for sure. The GROUP BY is a bit difficult to put in place. The ORDER BY is simple enough, however, when (re-)ordering you lose on speed... Something to keep in mind. On my tests, though, even the ORDER BY equivalents was very fast.