10
votes

I understand in Mongo we can have one master and multiple slaves where the master will be used for writes and slaves will be used for reading operations. Say M1, M2, M3 are nodes with M1 as master

But I read Cassandra is said to be a master-less model. Every node is said to be master. I did not get what it means?

Say M1, M2, M3 are nodes with M1 as master and M2, M3 are slaves in Mongo I believe write will always go M1 and read will always go to M2, M3

Say C1, C2, C3 are nodes in Cassandra Here I believe write and Read request can go to any node. That's why it is called master-less model.

2

2 Answers

15
votes

You are right, the nodes in Cassandra are equal and all of them can respond to user's query. That's because Cassandra picks Availability and Partition Tolerance in the CAP Theorem (whereas MongoDB picks Consistency and Partition Tolerance). And Cassandra can have linear scalability by simply adding new nodes into the node ring to handle more data.

The tradeoff here is the consistency problem. In Cassandra, they provide a solution called Replication Factor and Consistency Level to ensure the consistency as much as possible while maintaining strong availability.

Here is a good explanation of how read, write and replication work in Cassandra: brief-introduction-apache-cassandra

2
votes

In Cassandra, Masterless means any can process the request. The node which receives the request is called as coordinator node and its responsibility is to coordinate the read/write request.

Coordinator nodes use the quorum based protocol for handling read/write operations. Based on the consistency level, it issues R read request from the identified nodes which may have the data and then compares there data timestamp and returns the latest updated one.

For all the nodes which responded with stale data, were then sent an update request so that they could have the latest data now, this is probably known as hinted repair.

the same thing happens for writes has as well, It waits for at least say w node confirmations about write success before it declares the operation as a success.

The choice of particular number (r & w) defines the consistency level used. Let's say the sum of a number of reads & write nodes exceeds total available nodes (N) (r + w >= N) then you are always guaranteed to have the latest data, but it comes at the cost of availability.

Similarly if r + w < N, Consistency suffers but the availability of the system increases.