0
votes

I'm new in a sap hybris project and since a few weeks we have that problem that our hybris shuts down itself in production.

On production we have a backend server for the backoffice, cronjobs, imports etc. and we have a frontend server for ... well the frontend. Both are full hybris applications. Also our solr is on a dedicated maschine.

The frontend server works fine the whole time. Only our backened server shuts down without any good reason.

The backend server looks like this:

  • Java 8
  • Hybris 6.7
  • 36GB RAM
  • java.mem is 28GB
  • Ubuntu 16.04.2

This is what the console output says:

INFO   | jvm 1    | main    | 2020/08/27 09:04:50.411 | INFO  [cluster-ClusterId{value='5f462504979dc05c35c5afff', description='null'}-111.22.12.16:27017] [connection] Opened connection [connectionId{localValue:33, serverValue:8662}] to 111.22.12.16:27017
INFO   | jvm 1    | main    | 2020/08/27 09:04:50.513 | INFO  [cluster-ClusterId{value='5f462504979dc05c35c5afff', description='null'}-111.22.12.16:27017] [cluster] Monitor thread successfully connected to server with description ServerDescription{address=111.22.12.16:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 10]}, minWireVersion=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=115245710}
INFO   | jvm 1    | main    | 2020/08/27 09:04:51.014 |
INFO   | jvm 1    | main    | 2020/08/27 09:05:01.224 | INFO  [RemoveFavouriteListEntryWithProductEmptyCronJob::de.hybris.platform.servicelayer.internal.jalo.ScriptingJob] (RemoveFavouriteListEntryWithProductEmptyCronJob) [ScriptingJobPerformable] ### Starting executing script : model://removeFavouriteListEntryWithProductEmptyScript ###
INFO   | jvm 1    | main    | 2020/08/27 09:05:01.224 | INFO  [update-shopOrderIndex-cronJob::de.hybris.platform.servicelayer.internal.jalo.ServicelayerJob] (update-shopOrderIndex-cronJob) [SolrIndexerJob] Started indexer cronjob.
INFO   | jvm 1    | main    | 2020/08/27 09:05:01.324 | INFO  [RemoveFavouriteListEntryWithProductEmptyCronJob::de.hybris.platform.servicelayer.internal.jalo.ScriptingJob] (RemoveFavouriteListEntryWithProductEmptyCronJob) [ScriptingJobPerformable] de.hybris.platform.servicelayer.internal.model.impl.DefaultModelService$$EnhancerBySpringCGLIB$$d7ee97a3@7ca8259bFavoritesListEntry removed []done
INFO   | jvm 1    | main    | 2020/08/27 09:05:01.324 | INFO  [RemoveFavouriteListEntryWithProductEmptyCronJob::de.hybris.platform.servicelayer.internal.jalo.ScriptingJob] (RemoveFavouriteListEntryWithProductEmptyCronJob) [ScriptingJobPerformable] ### Finished executing script, returned script result = null ###
INFO   | jvm 1    | main    | 2020/08/27 09:05:01.324 | INFO  [update-shopStockItemIndex-cronJob::de.hybris.platform.servicelayer.internal.jalo.ServicelayerJob] (update-shopStockItemIndex-cronJob) [SolrIndexerJob] Started indexer cronjob.
INFO   | jvm 1    | main    | 2020/08/27 09:05:01.425 | INFO  [update-shopQuoteIndex-cronJob::de.hybris.platform.servicelayer.internal.jalo.ServicelayerJob] (update-shopQuoteIndex-cronJob) [SolrIndexerJob] Started indexer cronjob.
INFO   | jvm 1    | main    | 2020/08/27 09:05:01.826 | INFO  [update-shopQuoteIndex-cronJob::de.hybris.platform.servicelayer.internal.jalo.ServicelayerJob] (update-shopQuoteIndex-cronJob) [SolrIndexerJob] Finished indexer cronjob.
INFO   | jvm 1    | main    | 2020/08/27 09:05:02.127 | INFO  [update-shopStockItemIndex-cronJob::de.hybris.platform.servicelayer.internal.jalo.ServicelayerJob] (update-shopStockItemIndex-cronJob) [SolrIndexerJob] Finished indexer cronjob.
INFO   | jvm 1    | main    | 2020/08/27 09:05:02.429 | INFO  [update-shopOrderIndex-cronJob::de.hybris.platform.servicelayer.internal.jalo.ServicelayerJob] (update-shopOrderIndex-cronJob) [SolrIndexerJob] Finished indexer cronjob.
INFO   | jvm 1    | main    | 2020/08/27 09:05:02.931 |
INFO   | jvm 1    | main    | 2020/08/27 09:05:27.177 | WARN  [jgroups-11,shoplive-broadcast,hybrisnode-1] [NAKACK2] JGRP000011: hybrisnode-1: dropped message batch from non-member hybrisnode-2 (view=[hybrisnode-1|2] (1) [hybrisnode-1])
INFO   | jvm 1    | main    | 2020/08/27 09:05:27.677 |
INFO   | jvm 1    | main    | 2020/08/27 09:05:49.412 | INFO  [solrclient-cleanup-master] [DefaultSolrClientPool] Checking pooled Solr client [config=shopSolrServerConfig]
INFO   | jvm 1    | main    | 2020/08/27 09:05:49.912 |
INFO   | jvm 1    | main    | 2020/08/27 09:06:19.376 | INFO  [cluster-ClusterId{value='5f462504979dc05c35c5afff', description='null'}-111.22.12.16:27017] [connection] Opened connection [connectionId{localValue:34, serverValue:8664}] to 111.22.12.16:27017
INFO   | jvm 1    | main    | 2020/08/27 09:06:19.877 |
INFO   | jvm 1    | main    | 2020/08/27 09:06:29.399 | WARN  [jgroups-11,shoplive-broadcast,hybrisnode-1] [NAKACK2] JGRP000011: hybrisnode-1: dropped message batch from non-member hybrisnode-2 (view=[hybrisnode-1|2] (1) [hybrisnode-1]) (received 3 identical messages from hybrisnode-2 in the last 62247 ms)
INFO   | jvm 1    | main    | 2020/08/27 09:06:29.900 |
INFO   | jvm 1    | main    | 2020/08/27 09:06:49.436 | INFO  [solrclient-cleanup-master] [DefaultSolrClientPool] Checking pooled Solr client [config=shopSolrServerConfig]
INFO   | jvm 1    | main    | 2020/08/27 09:06:49.937 |
INFO   | jvm 1    | main    | 2020/08/27 09:07:49.358 | INFO  [solrclient-cleanup-master] [DefaultSolrClientPool] Checking pooled Solr client [config=shopSolrServerConfig]
INFO   | jvm 1    | main    | 2020/08/27 09:07:49.860 |
INFO   | jvm 1    | main    | 2020/08/27 09:07:51.064 | WARN  [jgroups-25,shoplive-broadcast,hybrisnode-1] [NAKACK2] JGRP000011: hybrisnode-1: dropped message batch from non-member hybrisnode-2 (view=[hybrisnode-1|2] (1) [hybrisnode-1]) (received 2 identical messages from hybrisnode-2 in the last 81647 ms)
INFO   | jvm 1    | main    | 2020/08/27 09:07:51.565 |
INFO   | jvm 1    | main    | 2020/08/27 09:08:20.737 | INFO  [cluster-ClusterId{value='5f462504979dc05c35c5afff', description='null'}-111.22.12.16:27017] [connection] Opened connection [connectionId{localValue:35, serverValue:8665}] to 111.22.12.16:27017
INFO   | jvm 1    | main    | 2020/08/27 09:08:21.238 |
STATUS | wrapper  | main    | 2020/08/27 09:08:35.081 | Pinging the JVM took 2 seconds to respond.
INFO   | jvm 1    | main    | 2020/08/27 09:08:49.617 | INFO  [solrclient-cleanup-master] [DefaultSolrClientPool] Checking pooled Solr client [config=shopSolrServerConfig]
INFO   | jvm 1    | main    | 2020/08/27 09:08:50.118 |
INFO   | jvm 1    | main    | 2020/08/27 09:09:16.324 | INFO  [cluster-ClusterId{value='5f462504979dc05c35c5afff', description='null'}-111.22.12.16:27017] [connection] Opened connection [connectionId{localValue:36, serverValue:8688}] to 111.22.12.16:27017
INFO   | jvm 1    | main    | 2020/08/27 09:09:16.825 |
STATUS | wrapper  | main    | 2020/08/27 09:09:46.483 | Pinging the JVM took 1 seconds to respond.
STATUS | wrapper  | main    | 2020/08/27 09:09:52.562 | JVM received a signal SIGKILL (9).
STATUS | wrapper  | main    | 2020/08/27 09:09:52.563 | JVM process is gone.
ERROR  | wrapper  | main    | 2020/08/27 09:09:52.563 | JVM exited unexpectedly.
STATUS | wrapper  | main    | 2020/08/27 09:09:52.984 | Reloading Wrapper configuration...
STATUS | wrapper  | main    | 2020/08/27 09:09:53.209 | Launching a JVM...
INFO   | jvm 2    | main    | 2020/08/27 09:09:53.415 | Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000e9990000, 29366878208, 0) failed; error='Cannot allocate memory' (errno=12)
INFO   | jvm 2    | main    | 2020/08/27 09:09:53.416 | #
INFO   | jvm 2    | main    | 2020/08/27 09:09:53.416 | # There is insufficient memory for the Java Runtime Environment to continue.
INFO   | jvm 2    | main    | 2020/08/27 09:09:53.416 | # Native memory allocation (mmap) failed to map 29366878208 bytes for committing reserved memory.
INFO   | jvm 2    | main    | 2020/08/27 09:09:53.416 | # An error report file with more information is saved as:
INFO   | jvm 2    | main    | 2020/08/27 09:09:53.416 | # /home/user/hybris/bin/platform/tomcat/bin/hs_err_pid31526.log
ERROR  | wrapper  | main    | 2020/08/27 09:09:53.459 | JVM exited while loading the application.
STATUS | wrapper  | main    | 2020/08/27 09:09:53.862 | Reloading Wrapper configuration...
STATUS | wrapper  | main    | 2020/08/27 09:09:54.076 | Launching a JVM...
ERROR  | wrapper  | main    | 2020/08/27 09:09:54.117 | JVM exited while loading the application.
INFO   | jvm 3    | main    | 2020/08/27 09:09:54.118 | Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000e9990000, 29366878208, 0) failed; error='Cannot allocate memory' (errno=12)
INFO   | jvm 3    | main    | 2020/08/27 09:09:54.118 | #
INFO   | jvm 3    | main    | 2020/08/27 09:09:54.118 | # There is insufficient memory for the Java Runtime Environment to continue.
INFO   | jvm 3    | main    | 2020/08/27 09:09:54.118 | # Native memory allocation (mmap) failed to map 29366878208 bytes for committing reserved memory.
INFO   | jvm 3    | main    | 2020/08/27 09:09:54.118 | # An error report file with more information is saved as:
INFO   | jvm 3    | main    | 2020/08/27 09:09:54.118 | # /home/user/hybris/bin/platform/tomcat/bin/hs_err_pid31536.log
STATUS | wrapper  | main    | 2020/08/27 09:09:54.521 | Reloading Wrapper configuration...
STATUS | wrapper  | main    | 2020/08/27 09:09:54.737 | Launching a JVM...
ERROR  | wrapper  | main    | 2020/08/27 09:09:54.773 | JVM exited while loading the application.
INFO   | jvm 4    | main    | 2020/08/27 09:09:54.773 | Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000e9990000, 29366878208, 0) failed; error='Cannot allocate memory' (errno=12)
INFO   | jvm 4    | main    | 2020/08/27 09:09:54.773 | #
INFO   | jvm 4    | main    | 2020/08/27 09:09:54.773 | # There is insufficient memory for the Java Runtime Environment to continue.
INFO   | jvm 4    | main    | 2020/08/27 09:09:54.773 | # Native memory allocation (mmap) failed to map 29366878208 bytes for committing reserved memory.
INFO   | jvm 4    | main    | 2020/08/27 09:09:54.773 | # An error report file with more information is saved as:
INFO   | jvm 4    | main    | 2020/08/27 09:09:54.773 | # /home/user/hybris/bin/platform/tomcat/bin/hs_err_pid31546.log
STATUS | wrapper  | main    | 2020/08/27 09:09:55.176 | Reloading Wrapper configuration...
STATUS | wrapper  | main    | 2020/08/27 09:09:55.385 | Launching a JVM...
ERROR  | wrapper  | main    | 2020/08/27 09:09:55.425 | JVM exited while loading the application.
INFO   | jvm 5    | main    | 2020/08/27 09:09:55.425 | Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000e9990000, 29366878208, 0) failed; error='Cannot allocate memory' (errno=12)
INFO   | jvm 5    | main    | 2020/08/27 09:09:55.425 | #
INFO   | jvm 5    | main    | 2020/08/27 09:09:55.425 | # There is insufficient memory for the Java Runtime Environment to continue.
INFO   | jvm 5    | main    | 2020/08/27 09:09:55.425 | # Native memory allocation (mmap) failed to map 29366878208 bytes for committing reserved memory.
INFO   | jvm 5    | main    | 2020/08/27 09:09:55.425 | # An error report file with more information is saved as:
INFO   | jvm 5    | main    | 2020/08/27 09:09:55.425 | # /home/user/hybris/bin/platform/tomcat/bin/hs_err_pid31556.log
STATUS | wrapper  | main    | 2020/08/27 09:09:55.827 | Reloading Wrapper configuration...
STATUS | wrapper  | main    | 2020/08/27 09:09:56.039 | Launching a JVM...
ERROR  | wrapper  | main    | 2020/08/27 09:09:56.076 | JVM exited while loading the application.
INFO   | jvm 6    | main    | 2020/08/27 09:09:56.077 | Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000e9990000, 29366878208, 0) failed; error='Cannot allocate memory' (errno=12)
INFO   | jvm 6    | main    | 2020/08/27 09:09:56.077 | #
INFO   | jvm 6    | main    | 2020/08/27 09:09:56.077 | # There is insufficient memory for the Java Runtime Environment to continue.
INFO   | jvm 6    | main    | 2020/08/27 09:09:56.077 | # Native memory allocation (mmap) failed to map 29366878208 bytes for committing reserved memory.
INFO   | jvm 6    | main    | 2020/08/27 09:09:56.077 | # An error report file with more information is saved as:
INFO   | jvm 6    | main    | 2020/08/27 09:09:56.077 | # /home/user/hybris/bin/platform/tomcat/bin/hs_err_pid31566.log
FATAL  | wrapper  | main    | 2020/08/27 09:09:56.378 | There were 5 failed launches in a row, each lasting less than 300 seconds.  Giving up.
FATAL  | wrapper  | main    | 2020/08/27 09:09:56.379 |   There may be a configuration problem: please check the logs.
STATUS | wrapper  | main    | 2020/08/27 09:09:56.580 | <-- Wrapper Stopped

For me it looks like the server shuts down, with no reason, but does it not completely and keeps the current RAM and then it tries to start again, but doesn't have enough RAM.

Also it is the first project where I see that a hybris shuts down itself. I worked already in 4 other big projects, never had this kind of problem.

1
I can see that too, but it is not. After it kills itself there is enough RAM. Also there is enough RAM for all tasks it does. And Java should handle the RAM used itself or not?Mr.Tr33
The log says you have insufficient memory. What is the max heap size set to? You should be looking for the -Xms setting. Do you start it with a shell script?rjdkolb
-Xms and -Xmx are both set to 28GB. And yes I start the server thorugh the normal hybrisserver.sh script, which starts the tomcat/bin/catalina.sh which starts the tomcat/bin/wrapper.sh.Mr.Tr33
Add the flag " -XX:+HeapDumpOnOutOfMemoryError " to the JAVA_OPTS, this will create a dump file when it crashes. This dump file can be opened by JConsole and you can debug it from there.rjdkolb
Thank you, I will add the flag and see then :)Mr.Tr33

1 Answers

0
votes

This looks like a non-standard termination to me. In the log you can see

JVM received a signal SIGKILL (9)

That is something external sending a kill -9. If the wrapper does it, you should see messages about requesting shutdown & then timing out the shutdown operation, before the wrapper eventually then does kill -9. This log shows something going straight in with kill -9.

I guess you will need to look at external systems and/or the O/S - I'm told the O/S can do this itself if the system is under severe memory pressure. I know you say that the system has plenty of memory, but could something else be going on to consume that memory elsewhere ... or could this be a VM where the actual hardware is over-subscribed so the VM does not have as much memory as it appears?