0
votes

I have a 15 node elasticsearch cluster and am indexing a lot of documents. The documents are of the form { "message": "some sentences" }. When I had a 9 node cluster, I could get CPU utilization upto 80% on all of them, when I turned it into a 15 node cluster, i get 90% CPU usage on 4 nodes and only ~50% on the rest.

The specification of the cluster is:

15 Nodes c4.2xlarge EC2 insatnces

15 shards, no replicas

There is load balancer in-front of all the instances and the instances are accessed through the load balancer.

Marvel is running and is used to monitor the cluster

Refresh interval 1s

I could index 50k docs/sec on 9 nodes and only 70k docs/sec on 15 nodes. Shouldn't I be able to do more?

1

1 Answers

0
votes

I'm not yet an expert on scalability and load balancing in ES but some things to consider :

  • load balancing should be native in ES thus having a load balancer in-front can actually mitigate the in-house load balancing results. It's kind of like having a speed limitation on your car but manually using the brakes, it doesn't make that much sense since your speed limitator should already do the job and will be prevented from doing it right when you input "manual regulation". Have you tried not using your load balancer and just using the native load balancing to see how it fares ?
  • while having more CPU / computation power across different servers / shards, it also forces you to go through multiple shards every time you write/read a document, thus if 1 shard can do N computations, M shards won't actually be able to do M*N computations
  • having 15 shards is probably overkill in a lot of cases
  • having 15 shards but no replication is weird/bad since if any of your 15 servers falls, you won't be able to access your whole index
  • you can actually hold multiple nodes on a single server

What is your index size in terms of storage ?