Are writes to replics done in parallel

Question

I have some doubts on how Cassandra performs a write request; I have two scenarios, please read them and ensure which one is correct.

Assume we have a cluster that consists of 4 nodes N1, N2, N3, and N4. As Cassandra distributes the nodes in ring topology, the nodes links as following:

N-->N-->N3-->N4-->N1

Also we have replication factor equal to 3, RF=3, and consistency level equals to ALL. CL=ALL

Client sends write request, W, to coordinator, say N4. The partitioner has determined the primary node of W is N1.

What will happen now?

Scenario 1: coordinator sends W toN1. Upon receiving W, N1stores it locally (in commitLog and memtable, please forget about internal process) and acknowledges the coordinator N4. Then N1 sends a copy of W to N2 (because N2is next node in ring fromN1 prospective). Upon receiving W,N2 stores it locally and sends acknowledgement toN4. ThenN2 sends a copy of W to N3 (because N3 is next node in ring fromN2 prospective). Upon receivingW,N3 stores it locally and acknowledges the CoordinatorN4. Finally as soon as coordinator,N4, receives an acknowledgement from all nodes (N1, N2, and N3), it replays it to the client.

Note that, if scenario 1 correct, then the latency will be 4 rounds

N4-->N1-->N2-->N3-->N4 ----client.

Scenario 2: coordinator, N4, broadcastsW to N1, N2, and N3 N4-->N1, N4-->N2, N4-->N3.

Then replicas (N1, N2, and N3) stores W locally and acknowledge toN4. When N4 receives all ACK's, it replays to client.

Can anyone confirm which scenario is correct in Cassandra?

Regards?

Alec Collier Alec Collier · Accepted Answer · 2015-08-21T16:09:08

Scenario 2 is correct. The requests are sent in parallel.

There is no benefit in querying the replicas in a sequential manner, it would simply make the request take that much longer as you pointed out. And if one of the nodes are down, it would take that much longer to find out that information.

Also note that in your above example, you are assuming a consistency level of ALL. i.e. the coordinator will wait to receive acknowledgement from all nodes with a replica of the data before returning to the client. If you have a lower consistency, say ONE or QUORUM, then the coordinator doesn't have to wait for an ACK from every single node with a replica and can return to the client quicker.

Are writes to replics done in parallel

2 Answers