2
votes

I'm newbie to search and elasticsearch. I have gone some online docs and developed some app using elasticsearch setup in our test environment. So far, its smooth in developing and testing, Now do create in production and setup the cluster, i need some expert advise on,

  1. Number of shards
  2. Number of replicas
  3. Should i need to separate out master and data nodes
  4. can all the nodes be data node
  5. i dont have any advanced search use case, but atleast need plural match (phone) should match all docs with phones and vice versa, any special stemming need in this case ?

My usecase and traffic patterns are,

  1. Upto 100M read per day
  2. Upto 1M write/update per day
  3. Initial data size 10GB, grow rate 1 GB every 6 months

Cluster info 1. Initial cluster size 14 machines, 28 GB RAM / 120 GB spin hard disk / 12 cores 2. load balancer with dns, would distribute the traffic to any 14 machines.

I have used unicast and i have bootstrap.mlockall: true and index.routing.allocation.disable_allocation: false

Please advise.

Thanks

1
I am too waiting for expert advice - Roopendra

1 Answers

2
votes

1. Number of shards

The number of shards in Elasticsearch is a one-time setting, once your shard size is set you cannot change it. So you need to plan how many shards are required for your cluster taking into consideration your current dataset size plus any index growth. To do this set up one Elasticsearch node with one shard and zero replicas on a box that has the same specifications as your production boxes.

The capacity of a single shard will depend on a number of factors:

  • The size of your documents

  • The size of your fields

  • The amount a RAM you assign the JVM that runs Elasticsearch. If you have lots of aggregations, sorting and parent/child documents, you will need to make sure that you have assigned enough RAM to Elasticsearch so it can cache the results.

  • Your number of queries per second requirement.

  • The maximum search request response time allowed.

Index documents into your single shard node at iterations of x million (or less), at each iteration perform benchmarks by executing x queries per second using a testing tool like JMeter. When the queries in your tests are returning response times that are reaching your maximum search request time you have the amount of documents a single shard can index. Once you have this value you can calculate the number of shards that is required for your full dataset and calculate how many shards you will need for index growth.

2. Number of replicas

Start with 1 replica, a replica shard will be placed on a different node from its primary shard so if one node goes down you still have the full dataset available. One replica is usually sufficient, if you find you need more you can always add them in later on.

3.Should i need to separate out master and data nodes

It depends on the size of your cluster, if you have more than 5 nodes in your cluster it is advisable to have master only nodes to maintain cluster state only.

4. can all the nodes be data node

There must always be at least one master node in your cluster, the master node maintains the cluster state. If you have a small cluster (< 5 nodes), you can make every node in your cluster both a data node and a master node. One of of the nodes will be elected as the master, if the master node goes down another node in the cluster will be elected as the master. If you have master only nodes as described in point 3, the rest of the of the nodes in the cluster can be data only nodes.

5. i dont have any advanced search use case, but atleast need plural match (phone) should match all docs with phones and vice versa, any special stemming need in this case ?

Yes, stemming will handle your use case.

Also, Elasticsearch comes with very good configurations OOTB, you should start out by only changing the configurations listed in the link below.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_important_configuration_changes.html