2
votes

We are running a DSE 3.2.2 cluster with cassandra and SolR enabled, 3 nodes and a replication factor of 2 in that particular cluster on virtual machines.

Data is written directly to c* using the a java client with default consistency level (recently changed to quorum).

The issue is that when querying an index the number of documents found varies a lot. Consequently, using the stats component on some of the numeric values also produces inconsistent results.

This is also the case if there is currently no data written. I have since manually triggered a nodetool repair on that column family, which triggered a re-index of the secondary indexes (which took some 5-6 hours). Afterwards, the results remain inconsistent.

In our use case, data that is out-of-date for some seconds is not an issue, so the workaround via session stickyness is not solving it for me. The problem is that data remains inconsistent for days after.

Next, a complete re-index with wiping the data is on the list, but will take some time to finish.

Update: Instead of a wipe and a re-index, I will upgrade to the latest version of C* and DSE, then run a repair, then run a re-index and report back asap (a few days at least).

Any suggestions or shared experience with query inconsistencies is greatly appreciated!

UPDATE #1

The query results remain still inconsistent. Every node seems to return a different number of documents for my query. The cluster has been upgraded to 4.5.1, sstables have been upgraded, repairs executed, and the entire SolR index has been rebuild using the full reindex trigger of the SolR GUI.

The data source table is still using the "old" compact storage option.

UPDATE #2

After the latest comments, I was not sure if further inserts had run in the meanwhile. So I made sure to hold off any inserts, ran nodetool repair, did a full rebuild of the index.

Queries seem to be OK! This seems to imply that the inconsistencies already re-appeared after my last attempt and are the result of some inserts after the rebuild of the indexes. I will try to confirm this be starting the inserts again.

UPDATE #3 So it looks like things are stable again! The upgrade seems to have resolved the issues initially, but due to problems with the changed default transport from tcp to http which we found in the log files, the inconsistencies remained. Switched back to http, repaired and reindexed two days ago. All inserts since without any issues. Thanks for the help! I will look into the tcp<->switch at a later time.

2
If you are inserting the data with the java driver, make sure you check your solrvalidation.log for errors. Also check your system.log file for any indexing errors. As when inserting through Cassandra interfaces you won't get Solr errors from the insert, as the indexing happens asynchronously.Zanson
So to recap: 1) Upgraded to 4.5.1. 2) Repaired all nodes and reindexed. 3) Did not run any further inserts. And yet, query results are still inconsistent: is that right?sbtourist

2 Answers

4
votes

Long term index inconsistencies are mainly caused by "dropped mutations" on Cassandra nodes: those happen when a write is correctly acknowledged to the client because the required CL is satisfied, but some nodes "out of the CL" didn't actually write it, which means they didn't index it either.

This kind of inconsistency is automatically solved by Cassandra via read repair and hinted handoff, but may also require a manual node repair; in any case, reindexing doesn't help, because the inconsistency is at the database level first.

DSE versions starting from 3.2.5 should include a Cassandra bug fix that greatly reduces dropped mutations: https://issues.apache.org/jira/browse/CASSANDRA-6510

Please let us know your DSE version, and if upgrading helps.

0
votes

Occasional inconsistencies such as what is described has been fixed in the 4.x series of releases. Is it possible to upgrade to the newer release?