2
votes

I have ingested 200 Million records in cassandra from spark. Using spark-cassandra connector.

I have faced following two problems. Sorry, the subject refers only one question.

1) com.datastax.driver.core.exceptions.WriteFailureException: Cassandra failure during write query at consistency LOCAL_QUORUM (1 response were required but only 0 replica responded, 1 failed)

I figured out that, having higher replication factor preferably 3 would solve the issue. I still faced the same issue.

Do i need to restart the cluster?

2nd & important) I ran spark job to do the count(*) on my table. Spark job didn't have any error in it. However, each time my job is giving me different counts. I strongly think cassandra is very stable and solid. May be I might be missing some important pieces here.

My actual number of rows: 286,530,307 
My first run result: 285,508,150
2nd Run: 285,174,293
3rd Run: 285,232,533

Why i got different results in different runs.?

My Key space creation:

CREATE KEYSPACE IF NOT EXISTS db_research WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

My table has 28 columns and used

Has any one of my errors triggered such results.? Even there are errors, it should show the same count every time.. What am i missing.?

1

1 Answers

0
votes

Try running nodetool repair - this will synchronize replicas across your cluster: https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html