0
votes

I need to find a way (if any) to accelerate indexing speed.
Currently with my cluster setup, which includes 8 storage optimized data nodes and 2 memory optimized master nodes, it takes approximately 20 hours for the data to be indexed.
The data volume gets relatively large(~1TB) when stored in shards.

All the nodes are up and running on AWS EC2 instances and only the master nodes are connected to a load balancer(ALB) from which all queries to Elasticsearch come through, so all bulk indexing queries go to this load balancer, then one of master nodes and finally the data nodes.
The following is set before bulk indexing

[Cluster]

  • 8 storage optimized dedicated data nodes
  • 2 memory optimized dedicated master nodes

[Index]

  • number_of_shards: 6
  • number_of_replicas: 0
  • refresh_interval: -1

Is there any way to improve the indexing performance of cluster with this settings?

2

2 Answers

2
votes

The Elasticsearch Reference has this tune for indexing speed doc. Except for the index properties, and more specifically the index.refresh_interval, you can also configure the indices.memory.index_buffer_size property.

From the the above mentioned docs:

be sure indices.memory.index_buffer_size is large enough to give at most 512 MB indexing buffer per shard doing heavy indexing (beyond that indexing performance does not typically improve). Elasticsearch takes that setting (a percentage of the java heap or an absolute byte-size), and uses it as a shared buffer across all active shards. Very active shards will naturally use this buffer more than shards that are performing lightweight indexing.

You could also optimize the mappings of your documents to get the best out of it. For example, if it is possible, you should use auto-generated ids, disable any feature that you do not use/need (_field_names field, or match phrase prefix queries)

1
votes

I would rather increase number of shards from 6 to atleast 50. At an average you can keep around 25 to 50 GB data per shard, don't make them too small or too big. If you increase them to a bigger number you should definitely see performance gain for writes and also reads.