0
votes

Our system has 3-4 tables where we keep counters (data types) of the events fired from our applications. We use Kafka for queueing and application is designed using dropwizard.

The concerned part of the system looks like this:

[Ingestion Module] -> Kafka -> [Analytics Module] -> Cassandra

The data is coming in high volume. And the moment we increase the number of workers/consumers in 'Analytics Module', we start getting the following exceptions:

! com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during COUNTER write query at consistency LOCAL_ONE (1 replica were required but only 0 acknowledged the write)
! at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:88)
! at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:66)
! at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:297)
! at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:268)
! at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
! ... 25 common frames omitted

Cassandra setup:

  • Nodes: 5
  • Replication Factor: 2
  • version: 3.4

Query 1

Can someone please help us out with the possible causes/solutions for this problem? Or please point us in the right direction.

Query 2

I had one more query around 'counter' data type. Is update on the counter data type thread safe or it can lead to inconsistency if we try to update the same counter from multiple workers?

1

1 Answers

0
votes

Counter type isn’t “reliable” counter - due its nature you don’t know if write happened, or not. You can retry the operation, but this may lead to double write. If you don’t retry, then you may lose the data.

But if you need a reliable counting, you can use another approach - write every count event as separate row (with I sent marked as idempotent, so it will be retried, and overwrite the same data) inside some partition , and then have a separate job that will go through all rows and sum all individual counts.