Elasticsearch Bulk indexing slowing down

Question

I am blocked on indexing documents into ElasticSearch. I am trying to index (only) ~3.7M docs into an ES index, & after indexing ~3.4M docs (take around 3GB disk space), the indexing rate has come down to approximately 10 docs/min, which is very worrying.

The index (mistakenly) has a single shard only, which I think could be causing a bottleneck somewhere.

Elasticsearch version: 1.7.1

Node config:

m3.large (7.5 GB RAM, 32 GB SSD storage)
ES_Heap_size: 1 GB (this is what I see on KOPF, which also shows Heap usage: ~400MB out of ~1008MB)
50GB EBS volume attached with each node

We are using TransportClient for interacting with ES. The BulkProcessor has been configured for a bulk size of 5 MB, flush interval of 2 min(to avoid sending data<5 MB, we are bulk indexing) & 6 concurrent requests. There can be ~10 bulk requests in parallel to ES.

After seeing the indexing rate slowed down, I did the following:

> Changed the cluster setting to threadpool.bulk.size of 2 & threadpool.bulk.queue_size: 80.
> Turned off index refreshes for my index.
> Set number_of_replicas to 0 (earlier it was 1).
> indices.store.throttle.type has also been set to "none" to avoid indexing throttling while segment merges.

None of the above helped.

The KOPF dashboard shows around <14% CPU usage.

Please help. Thanks!

Sharman25 Sharman25 · Accepted Answer · 2015-09-29T20:58:39

I had used Bulkprocessor and inserted around 7 mil (in 150 secs) records but didn't observe slowness at any point from ES side. I used BulkProcessor (single threaded) implemented in Java using transport client. Rest of the config is almost same.

If above results seem promissing then perhaps you may try few things:

Check your runtime memory usage (where you are reading/processing/writing) of program code. It might be touching the max available memory. in case of Java, check allocated java heap available to the program.
Try using multiple node configuration (i was using three node ES cluster on a single windows m/c)
Try to use multiple shards. (i was using 5 shards)
Try increasing ES heap size, to around 1.5 GB

Elasticsearch Bulk indexing slowing down

3 Answers