1
votes

I understand that Cassandra is a NoSQL db and patching it with many indices is not the way to go, but here I'm looking at solution for my analytics cluster, not for the production/real-time one.

So I think it makes sense to add indices to reduce the amount of data filtered by Spark.

How do native Cassandra secondary indices compare to Lucene's indices?

Many functionalities are not available with Cassandra alone, but what about things that you can do with both?

Is it better / does it make sense to only use Lucene?

Another advantage that I see is that I can install Lucene only on my analytics cluster, without overloading the real-time one with indices (and therefore improving the write performance on that side).

1
What exactly is your analytics use-case and why do you think you need a NoSQL as the storage layer for Spark? Will Spark perform any writes to this storage? Do you need search capabilities on the data (Lucene)? or just processing In short please provide some more information... - Moshe Eshel
Spark might do some writes but this is not in the most common use case. I do not need "search" capabilities but rather where predicates capabilities. - Cedric H.

1 Answers

2
votes

Don't bother with Lucene integration

Since Cassandra 3.4, we have a new secondary index called SASI that offers full text search and is quite performant.

Read this: https://github.com/apache/cassandra/blob/trunk/doc/SASI.md