I am blocked on indexing documents into ElasticSearch. I am trying to index (only) ~3.7M docs into an ES index, & after indexing ~3.4M docs (take around 3GB disk space), the indexing rate has come down to approximately 10 docs/min, which is very worrying.
The index (mistakenly) has a single shard only, which I think could be causing a bottleneck somewhere.
Elasticsearch version: 1.7.1
Node config:
m3.large (7.5 GB RAM, 32 GB SSD storage)
ES_Heap_size: 1 GB (this is what I see on KOPF, which also shows Heap usage: ~400MB out of ~1008MB)
50GB EBS volume attached with each node
We are using TransportClient for interacting with ES. The BulkProcessor has been configured for a bulk size of 5 MB, flush interval of 2 min(to avoid sending data<5 MB, we are bulk indexing) & 6 concurrent requests. There can be ~10 bulk requests in parallel to ES.
After seeing the indexing rate slowed down, I did the following:
> Changed the cluster setting to threadpool.bulk.size of 2 & threadpool.bulk.queue_size: 80.
> Turned off index refreshes for my index.
> Set number_of_replicas to 0 (earlier it was 1).
> indices.store.throttle.type has also been set to "none" to avoid indexing throttling while segment merges.
None of the above helped.
The KOPF dashboard shows around <14% CPU usage.
Please help. Thanks!