1
votes

I have a Cassandra (2.2.1) cluster of 4 nodes which is used by Java client application. Replication factor is 3, consistency level is LOCAL_QUORUM for reads and writes. Each node has around 5 GB of data. Amounts of requests is approximately 2-4k per second. There are almost no delete operations, so small amount of tombstones is created.

I have noticed poor reads and writes performance some time ago, and it is getting worse with time - the cluster is getting really slow. Read (mostly often) and write timeouts have become very often. Hardware should not cause the problem, servers where cluster is deployed are really good in terms of disk performance, CPU and RAM resources.

The cause of the issue is unclear to me, but I have noticed several log entries which may point to the root cause:

  1. Exception stack trace in Java client application log:

    com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded)

It is interesting that 1 node still responds.

  1. Several entries of failed hints errors:

    Failed replaying hints to /1.1.1.1; aborting (135922 delivered), error : Operation timed out - received only 0 responses.

  2. Several following exceptions in cassandra logs:

    Unexpected exception during request; channel = [id: 0x10fc77df, /2.2.2.2:54459 :> /1.1.1.1:9042] java.io.IOException: Error while read(...): Connection timed out at io.netty.channel.epoll.Native.readAddress(Native Method) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]

  3. Failed batch errors:

    Batch of prepared statements for [<...>] is of size 3453794, exceeding specified threshold of 1024000 by 2429794. (see batch_size_fail_threshold_in_kb)

Looks like the batch is too large, we have lots of batch operations by the way. Maybe batches affect the system?

  1. Finally, exception which is seen mostly often - these entries appear one after another after switching logging level to DEBUG:

    TIOStreamTransport.java:112 - Error closing output stream. java.net.SocketException: Socket closed at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116) ~[na:1.8.0_66] at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_66] at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_66] at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.8.0_66] at java.io.FilterOutputStream.close(FilterOutputStream.java:158) ~[na:1.8.0_66] at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.cassandra.thrift.TCustomSocket.close(TCustomSocket.java:197) [apache-cassandra-2.2.1.jar:2.2.1] at org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89) [libthrift-0.9.2.jar:0.9.2] at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:209) [apache-cassandra-2.2.1.jar:2.2.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_66] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]

Do you have any ideas about what can cause this problem?

Thank you!

2
Are you doing batches for atomic reasons? Because batches are not for performance in Cassandra.Jeff Beck
Yes, I am doing batches for atomic reasons only. I just thought - can large batches slow down the whole cluster?Atver
Yes, large batches will degrade performance.Chris Gerlt

2 Answers

0
votes

For the 1st point i have an idea :

When you issue a query, there is always a thread that should take care of it.

If there are too many, there is a queue that should organise them.

There is also a timeout for how much a thread could wait up in a queue.

So your replicas are not replying fast enough because somne of the threads serving a specific query are discarded.

Consider playing a little with number of write/read threads. If your system is good enough you can allocate more workers in that area.

I remember playing with cassandra stress a while and the rate threads= where default was 32. Consider increasing in cassandra.yaml the number of

  • concurrent_reads from 32 up to 128
  • concurrent_writes from 32 up to 128

You may also consider DECREASING the numbers. I recommend to test test and retest.

You may also play with the timeouts (how much a thread can live in a queue to serve responses)

  • read_request_timeout_in_ms from 5000 up to something 10000
  • write_request_timeout_in_ms from 2000 up to something like 5000.

On point 2. I suspect the same, your node is trying to reply the hints so 2 things happens :

  1. is not reaching the node (check some network issues )

  2. maybe you need to allocate more working threads, affecting the max_hints_delivery_threads.

Point 3 looks related to point 1.

Good luck.

0
votes

It's actually might be connected to the threads limited memory that cannot handle hints. It may be solved by increasing -Xss See more: https://issues.apache.org/jira/browse/CASSANDRA-4740