Usage of Cassandra 'com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex' in CQL queries

Question

com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex is a custom Cassandra index type introduced by Datastax for Solr integration. My main question is: Can't CQL queries use these custom indexes? I've tried a few CQL queries with filters on the indexed columns but they always end with an RPC timeout.

My use case:

I have a table where the queries usually involve filters on multiple columns. Since Cassandra's native secondary indexes can only be defined in one column at a time (i.e. one index = one column) and only one index can be used by any given CQL query, I figured that I can't fulfill my application's read requirements using CQL. This is why I resorted to Solr for ALL read operations - because Solr can filter on multiple columns at once. This works fine for most cases; BUT I have two queries that turned out to be too heavy for Solr. Now, I want to try Spark because I've read about its amazing analytics capabilities. However, I stumbled across a blocker: Spark relies on CQL "WHERE" to filter out the data that will be loaded from Cassandra to Spark. And since CQL queries seemingly can't use Cql3SolrSecondaryIndex for reads, I don't know how else I can load my data into Spark. I'm aware that filtering on the Cassandra server side is not compulsory when loading data from Cassandra to Spark; but in my case, it is required because the table is too big (approx. 4 billion records spread across 6 nodes at RF=2). I tried to define a native Cassandra index in one of the columns that I intend to filter on, but Cassandra threw an error saying that an index already exists for that column (i.e. the Cql3SolrSecondaryIndex index).

As it appears to me now: DSE forces me to choose between Solr and Spark - if I include a column in the Solr core, a Cql3SolrSecondaryIndex index will be defined in that column and I cannot define a native Cassandra index into it anymore. Without a native Cassandra index, CQL queries cannot filter on that column. Without server-side CQL filtering, Spark would choke up trying to load all 4 billion rows and would likely trigger an OOM.

Is my impression correct? Is there a workaround?

phact phact · Accepted Answer · 2014-09-22T14:14:47

You are able to use solr indexes in CQL by using CQL solr queries. This is not recommended for production usage (stick to the HTTP API) but in your case, it may be the best bet.

The syntax is as follows:

SELECT ...

FROM ...

where solr_query = 'search expression'

[LIMIT ....]

your search expression should conform to Lucene syntax.

Here is the link to the Datastax documentation: http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/srch/srchCql.html

Usage of Cassandra 'com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex' in CQL queries

1 Answers