Elasticsearch bulk update is extremely slow

Question

I am indexing a large amount of daily data ~160GB per index into elasticsearch. I am facing this case where I need to update almost all the docs in the indices with a small amount of data(~16GB) which is of the format

id1,data1
id1,data2
id2,data1
id2,data2
id2,data3
.
.
.

My update operations start happening at 16000 lines per second and in over 5 minutes it comes down to 1000 lines per second and doesnt go up after that. The update process for this 16GB of data is currently longer than the time it takes for my entire indexing of 160GB to happen

My conf file for the update operation currently looks as follows

output
{
    elasticsearch {
        action => "update"
        doc_as_upsert => true
        hosts => ["host1","host2","host3","host4"]
        index => "logstash-2017-08-1"
        document_id => "%{uniqueid}"
        document_type => "daily"
        retry_on_conflict => 2
        flush_size => 1000
    }

}

The optimizations I have done to speed up indexing in my cluster based on the suggestions here https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html are

Setting "indices.store.throttle.type" : "none"
Index "refresh_interval" : "-1"

I am running my cluster on 4 instances of the d2.8xlarge EC2 instances. I have allocated 30GB of heap to each nodes. While the update is happening barely any cpu is used and the load is very less as well.

Despite everything the update is extremely slow. Is there something very obvious that I am missing that is causing this issue? While looking at the threadpool data I find that the number of threads working on bulk operations are constantly high.

Any help on this issue would be really helpful

Thanks in advance

what language are you using? sounds like you got a memory leak somewhere. perhaps a file left open? — RisingSun
I am facing the same issue. We're you able to solve the slowness? — Zaid Amir

sysadmin1138 sysadmin1138 · Accepted Answer · 2017-03-12T03:55:47

There are a couple of rule-outs to try here.

Memory Pressure

With 244GB of RAM, this is not terribly likely, but you can still check it out. Find the jstat command in the JDK for your platform, though there are visual tools for some of them. You want to check both your Logstash JVM and the ElasticSearch JVMs.

jstat -gcutil -h7 {PID of JVM} 2s

This will give you a readout of the various memory pools, garbage collection counts, and GC timings for that JVM as it works. It will update every 2 seconds, and print headers every 7 lines. Spending excessive time in the FCT is a sign that you're underallocated for HEAP.

I/O Pressure

The d2.8xlarge is a dense-storage instance, and may not be great for a highly random, small-block workload. If you're on a Unix platform, top will tell you how much time you're spending in IOWAIT state. If it's high, your storage isn't up to the workload you're sending it.

If that's the case, you may want to consider provisioned IOP EBS instances rather than the instance-local stuff. Or, if your stuff will fit, consider an instance in the i3 family of high I/O instances instead.

Logstash version

You don't say what version of Logstash you're using. Being StackOverflow, you're likely to be using 5.2. If that's the case, this isn't a rule-out.

But, if you're using something in the 2.x series, you may want to set the -w flag to 1 at first, and work your way up. Yes, that's single-threading this. But the ElasticSearch output has some concurrency issues in the 2.x series that are largely fixed in the 5.x series.

Elasticsearch bulk update is extremely slow

3 Answers