Is using a load balancer with ElasticSearch unnecessary?

Question

I have a cluster of 3 ElasticSearch nodes running on AWS EC2. These nodes are setup using OpsWorks/Chef. My intent is to design this cluster to be very resilient and elastic (nodes can come in and out when needed).

From everything I've read about ElasticSearch, it seems like no one recommends putting a load balancer in front of the cluster; instead, it seems like the recommendation is to do one of two things:

Point your client at the URL/IP of one node, let ES do the load balancing for you and hope that node never goes down.
Hard-code the URLs/IPs of ALL your nodes into your client app and have the app handle the failover logic.

My background is mostly in web farms where it's just common sense to create a huge pool of autonomous web servers, throw an ELB in front of them and let the load balancer decide what nodes are alive or dead. Why does ES not seem to support this same architecture?

xeraa xeraa · Accepted Answer · 2014-07-15T17:03:40

You don't need a load balancer — ES is already providing that functionality. You'd just another component, which could misbehave and which would add an unnecessary network hop.

ES will shard your data (by default into 5 shards), which it will try to evenly distribute among your instances. In your case 2 instances should have 2 shards and 1 just one, but you might want to change the shards to 6 for an equal distribution.

By default replication is set to "number_of_replicas":1, so one replica of each shard. Assuming you are using 6 shards, it could look something like this (R is a replicated shard):

node0: 1, 4, R3, R6
node1: 2, 6, R1, R5
node2: 3, 5, R2, R4

Assuming node1 dies, the cluster would change to the following setup:

node0: 1, 4, 6, R3 + new replicas R5, R2
node2: 3, 5, 2, R4 + new replicas R1, R6

Depending on your connection setting, you can either connect to one instance (transport client) or you could join the cluster (node client). With the node client you'll avoid double hops, since you'll always connect to the correct shard / index. With the transport client, your requests will be routed to the correct instance.

So there's nothing to load balance for yourself, you'd just add overhead. The auto-clustering is probably ES's greatest strength.

Is using a load balancer with ElasticSearch unnecessary?

4 Answers