4
votes

We are running elasticsearch-1.5.1 cluster with 6 nodes, Recent days I am facing the java.lang.OutOfMemoryError PermGen space issue in the cluster, This affect the node and the same will get down. I am restarting the particular node to get live.

We try to figure out this issue by giving heavy load to the cluster but unfortunatlly can't able to reproduce. But some how we are getting the same issue again and again in production.

Here some of yml file configuration

index.recovery.initial_shards: 1
index.query.bool.max_clause_count: 8192
index.mapping.attachment.indexed_chars: 500000
index.merge.scheduler.max_thread_count: 1
cluster.routing.allocation.node_concurrent_recoveries: 15
indices.recovery.max_bytes_per_sec: 50mb
indices.recovery.concurrent_streams: 5

Memory configuration

ES_HEAP_SIZE=10g
ES_JAVA_OPTS="-server -Des.max-open-files=true"
MAX_OPEN_FILES=65535
MAX_MAP_COUNT=262144

Update Question with below configuration

I suspect on the merge.policy.max_merged_segment related to this issue. We have 22 index in my cluster. the merge.policy.max_merged_segment for the indices is given below

  • 7 indices has 20gb
  • 3 indices has 10gb
  • 12 indices has 5gb

Update with process information

esuser xxxxx 1 28 Oct03 ? 1-02:20:40 /usr/java/default/bin/java -Xms10g -Xmx10g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -server -Des.max-open-files=true -Delasticsearch -Des.pidfile=/var/es/elasticsearch.pid -Des.path.home=/usr/es/elasticsearch -cp :/usr/es/elasticsearch/lib/elasticsearch-1.5.1.jar:/usr/es/elasticsearch/lib/:/usr/es/elasticsearch/lib/sigar/ -Des.default.path.home=/usr/es/elasticsearch -Des.default.path.logs=/es/es_logs -Des.default.path.data=/es/es_data -Des.default.path.work=/es/es_work -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch


Below the stack trace i am getting from elasticsearch cluster while search. But event while index time also i am getting the same issue. As per my observation some search/ index operation increase the PermGen, if the upcoming operations try to use the PermGen space the issue comes.

[2015-10-03 06:45:05,262][WARN ][transport.netty          ] [es_f2_01] Message not fully read (response) for [19353573] handler org.elasticsearch.search.action.SearchServiceTransportAction$6@21a25e37, error [true], resetting
[2015-10-03 06:45:05,262][DEBUG][action.search.type       ] [es_f2_01] [product_index][4], node[GoUqK7csTpezN5_xoNWbeg], [R], s[INITIALIZING]: Failed to execute [org.elasticsearch.action.search.SearchRequest@5c2fe4c4] lastShard [true]
org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream
    at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:176)
    at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:128)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.OutOfMemoryError: PermGen space

Can any help me to solve this issue. Thanks

1
This is unusual for Elasticsearch. What ES version is this? Do you have the same settings in production and testing environments? What are the memory settings for ES? Run ps -ef | grep elasticsearch and provide the result.Andrei Stefan
@AndreiStefan Added the process grep detail, I have already shared the es version, memory settings of ES in the Question. & I don't have the exact configuration in testing env comparing to production.Arun Prakash
Is that the complete stack trace of the exception? Is there a root exception, as well?Andrei Stefan
Yes this is the full stack what i have in the log... there is no cause... actual exception is PermGenArun Prakash
Not only config. Think about queries you run, how you access the cluster (client acces) etc.Andrei Stefan

1 Answers

1
votes

The best solution is to use a "Java 8" JVM.

While you could modify the amount of heap your Java 7 JVM is using (by setting -XX:MaxPermSize=... if you are using an Oracle JVM), if you just upgrade the JVM to version 8, then you don't even need to tune the permgen size.

This is because in JVM 8, the permgen size shares the heap in a non-partitioned way, meaning that you will only run out of permgen space when you run out of heap.