Couchbase Cluster Failover Architecture

Question

I’m referring to the Couchbase Server in the application stack section of this document, outlining the desired architecture of a Couchbase Cluster.

I notice that each of the 5 Couchbase nodes in the diagram have a corresponding web server. I am also aware that Couchbase SDKs are designed to establish a connection to a single node, and retain that connection for all requests, with the exception of failover events.

Firstly, I want to confirm that each of the 5 web servers in the diagram will establish a single connection to one of the 5 Couchbase nodes. I assume that a 1:1 relationship will result; each web server will connect to a corresponding Couchbase node, such that no 2 web servers will establish connections to the same Couchbase node.

If this is the case, then in the event of Couchbase node-failure, assuming that the node is unrecoverable, should I remove the corresponding web server? This may seem unintuitive, but here is the situation as I understand it:

Couchbase node #1 dies
Web server #1 (connected to Couchbase node #1) establishes a connection to the next available node, Couchbase node #2 (most SDKs handle this, FAIA)
Couchbase node #2 now has 2 established connections; from web server #2 (its corresponding server) and also now from web server #1 (whose corresponding Couchbase node is dead)

My concern is that I have noticed ephemeral port exhaustion issues with Couchbase Server, when establishing more than 1 connection to a single node. This generally results in client timeouts:

Get http://0.0.0.0:8091/pools: dial tcp 0.0.0.0:8091: operation timed out

Again, based on this, should I also remove the corresponding web server when a Couchbase node dies, to avoid multiple connections to the same Couchbase node, and potential ephemeral port exhaustion?

Kirk Kirk · Accepted Answer · 2015-07-10T16:42:13

There is not a 1:1 relationship between web server and Couchbase node. Each web server has connections to each Couchbase node. In Couchbase each node of the cluster has a percentage of the entire data set active, not a full copy. Couchbase automatically shards the data and these shards (vBuckets) are spread evenly across the entire cluster.

So when the web server or app server is going to read/write an object, it will go to the corresponding node in the cluster that owns the vBucket where that object lives. In the Couchbase SDKs there is a consistent hash that takes each object's ID and the output of the hash is a number between 1 and 1024. There are 1024 active vBuckets and each replica has another 1024. So the output of that consistent has is the vBucket ID that object will live in. Make sense? The SDK then quickly looks up in its copy of the cluster map (which is updated any time there is a cluster topology change) which node of the cluster that shard lives on and then goes to interact with that specific node directly for that object.

So your failure scenario is not quite correct. If a node of the Couchbase cluster fails, only the vBuckets that are on that node are unavailable. So only a percentage of the entire data set. If you have auto-failure turned on (off by default) then after the timeout set in the cluster, the cluster will automatically fail out the node timing out and promote the replica vBuckets to active, thus getting you back to 100% active data set. The cluster sacrifices those replica vBuckets basically. Since this is a topology change, a new cluster map is sent to your client applications with the SDKs and live moved on. Also, you will need to do a rebalance of the cluster to regenerate those now missing replica vBuckets and get you back to normal.

As for your ephemeral port exhaustion, how are you managing your connections to the cluster? Are you reusing the connections or opening new connections each time and then not closing them? You want to be opening connections and reusing them, not just keep opening new ones over and over. If you do open new ones each time and not clean up, you will definitely exhaust your port thus file descriptors. Like I said, reuse them.

Couchbase Cluster Failover Architecture

1 Answers