Best way to avoid a single point of failure with an elasticsearch cluster and a web server cluster

Question

We have a web application running on AWS with the following architecture:

1 elasticseach cluster with 2 data nodes
1 auto-scaling load-balanced cluster of web servers

As elasticsearch does some clever internal load balancing we could just point all the web servers at one of the data nodes. But this would create a single point of failure - if that node goes down then I'm not going to get any query results.

My solution thus far has been to have elasticsearch running on each web server as non-data nodes. Each web server queries its local elasticsearch node, which in turn farms the request off to one of the data nodes. This seems to be the suggested approach on the elasticsearch website

This is great in that if one of the data nodes fails in some way we don't lose the ability to serve search queries. However, it does mean elasticsearch is using resources on each web server, and if we migrate to using elastic beanstalk (which I'm keen to do) then we'll need to some how get elasticsearch installed on our web instances. EDIT: I've succeeded with this now, but have yet to figure out how to specify a different config for each environment.

Is there another way to avoid a single point of failure without having elasticsearch running on each web server?

I thought about using a load balancer in front of the data nodes to serve queries from the web servers, but that would also mean opening the cluster up to public access without setting up VPC to restrict access.

Is there a simpler solution I'm missing?

If you have 2 data nodes with 1 replica, one node can go down and you can still serve queries without even losing documents. Am I missing anything in your question? — javanna
You are correct. However, without using a local non-data node I'd lose the built-in ability to handle a node going down. Ie I'd have to detect the failure to connect and switch to a working data node. Maybe that's not such a big deal. It just seems in-optimal — user1207727
You mean that you want to use the client node as some kind of load balancer? Client libraries should support more addresses with round-robin and hopefully fallback to the other addresses if the first one doesn't work. Makes sense? — javanna
Yes that makes sense. We're using elasticsearch with a symfony2 php application, and unfortunately it doesn't support specifying more than one connection. So for the live environment we specify localhost, and leave elasticsearch itself to choose which data node to send the query to. It works great and is quite a neat solution. I was just wondering if I was missing another solution which didn't involve having elasticsearch running on each web server. — user1207727
I think you should add this tricky bit to your question as it makes the difference. You could just use a load balancer in front of elasticsearch (Nginx or Apache), your issue doesn't have anything to do with elasticsearch being exposed to single point of failures though! — javanna

Ken Liu Ken Liu · Accepted Answer · 2013-09-18T04:13:00

I don't think this directly answers your question, but if you are still ok with running ES on your web server nodes, you can customize the software that is installed using the .ebextensions mechanism, which allows you to run scripts and/or install packages when new Elastic Beanstalk instances are started up. If this isn't sufficient you can start your Elastic Beanstalk instances using a custom AMI.

Also, you may not be aware that you can run Elastic Beanstalk in a VPC.

Best way to avoid a single point of failure with an elasticsearch cluster and a web server cluster

1 Answers