Apache Ignite Frequent Cache Close Exception

Question

We are using Apache Ignite as a cache for speeding up our Authorization and Permission calls. During the application load on a client, the application hits Ignite cache with around 18 times with get calls and during this period, we get frequent cache closed exception from Ignite. We have been trying to replicate the same with a large number of calls but the error seems to happen for every 3/4 calls out of 18 when run local as well

We have applied all the recommended configurations to the cluster




 <!-- Alter configuration below as needed. -->
   <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
      <property name="peerClassLoadingEnabled" value="true" />
      <property name="cacheConfiguration">
         <list>
            <!-- Partitioned cache example configuration (Atomic mode). -->
            <bean class="org.apache.ignite.configuration.CacheConfiguration">
               <property name="name" value="default" />
               <property name="atomicityMode" value="ATOMIC" />
               <property name="backups" value="1" />
            </bean>
         </list>
      </property>
      <property name="includeEventTypes">
         <list>
            <util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_STARTED" />
            <util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_FINISHED" />
            <util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_FAILED" />
         </list>
      </property>
      <property name="binaryConfiguration">

         <bean class="org.apache.ignite.configuration.BinaryConfiguration">

            <property name="compactFooter" value="false" />

         </bean>
      </property>
      <!-- Configure internal thread pool. -->
      <property name="publicThreadPoolSize" value="64" />
      <!-- Configure system thread pool. -->
      <property name="systemThreadPoolSize" value="32" />
      <!--<property name="clientMode" value="false" />-->
      <property name="sqlSchemas">
         <list>
            <value>BA_EV</value>
            <value>BA_DEMO</value>
            <value>TEST_1</value>
            <value>TEST_2</value>
            <value>TEST_3</value>
         </list>
      </property>
      <!-- Enabling Apache Ignite Persistent Store. -->
      <property name="dataStorageConfiguration">
         <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
            <property name="defaultDataRegionConfiguration">
               <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                  <property name="persistenceEnabled" value="true" />
               </bean>
            </property>
         </bean>
      </property>
      <property name="communicationSpi">
         <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
            <property name="slowClientQueueLimit" value="1000" />
            <property name="messageQueueLimit" value="1024" />
         </bean>
      </property>
      <!-- Enabling authentication. -->
      <property name="authenticationEnabled" value="true" />
      <!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
      <property name="discoverySpi">
         <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
            <property name="ipFinder">
               <!--
                        Ignite provides several options for automatic discovery that can be used
                        instead os static IP based discovery. For information on all options refer
                        to our documentation: http://apacheignite.readme.io/docs/cluster-config
                    -->
               <!-- Uncomment static IP finder to enable static-based discovery of initial nodes. -->
               <!--<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">-->
               <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
                  <property name="addresses">
                     <list>
                        <!-- In distributed environment, replace with actual host IP address. -->
                        <value>localhost:47500..47509</value>
                     </list>
                  </property>
               </bean>
            </property>
         </bean>
      </property>
     </bean>
    </beans>

We are using simple cache.get() and cache.put() methods which are invoked using annotations.

For the time being we are using


    try {
        cache.put(key, returnType.cast(result));
        } catch (CacheException | org.hibernate.cache.CacheException e) {
            if (e.getCause() instanceof IgniteClientDisconnectedException) {
                IgniteClientDisconnectedException cause =(IgniteClientDisconnectedException)e.getCause();
                cause.reconnectFuture().get(); // Wait for reconnection.
                cache.put(key, returnType.cast(result));
                addDiconnectCount();
                LOGGER.error("Diconnection Reason Trace: ", e);
            }
        }

this to resolve the error but we are unable to figure out the underlying cause for the frequent cache re-connection. We ran with 4GB XMX on local and there was about 86% un-used heap memory and negligible node on ignite cluster as well as the client.

The object that we are trying to store are of types:

    Map<String, Serializable>
    map.put(String, String[]);
    List<String>

Logs:

>>> +----------------------------------------------------------------------+
>>> Ignite ver. 2.7.6#20190911-sha1:21f7ca41c4348909e2fd26ccf59b5b2ce1f4474e
>>> +----------------------------------------------------------------------+
>>> OS name: Windows 10 10.0 amd64
>>> CPU(s): 4
>>> Heap: 2.0GB
>>> VM name: 4484@xxxxxxx
>>> Local node [ID=7F470894-A979-443C-BC4F-7BCB047C7550, order=6, clientMode=true]
>>> Local node addresses: [xxxxxxx.xx.xxxxxxx.com/0:0:0:0:0:0:0:1, BLREQX1352123L.xx.xxxxxxx.com/10.73.4.44, 192.168.138.1/127.0.0.1, BLREQX1352123L.xx.xxxxxxx.com/172.2                    2.192.1, /192.168.138.1, /192.168.234.1]
>>> Local ports: TCP:10801 TCP:47101

[16:39:14,562][INFO][Thread-10][GridDiscoveryManager] Topology snapshot [ver=6, locNode=7f470894, servers=1, clients=1, state=ACTIVE, CPUs=4, offheap=3.2GB, heap=4.0GB]                         [16:39:45,598][INFO][main][Http11NioProtocol] Starting ProtocolHandler ["http-nio-5014"]
[16:39:45,598][INFO][main][NioSelectorPool] Using a shared selector for servlet write/read
[16:40:14,564][INFO][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=7f470894, uptime=00:01:00.016]
    ^-- H/N/C [hosts=1, nodes=2, CPUs=4]
    ^-- CPU [cur=0.13%, avg=31.59%, GC=0%]
    ^-- PageMemory [pages=0]
    ^-- Heap [used=183MB, free=91.06%, comm=1024MB]
    ^-- Off-heap [used=0MB, free=-1%, comm=0MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=0, qSize=0]
[16:40:20,864][INFO][exchange-worker-#38][GridCacheProcessor] Started cache [name=userRoleCache, id=-2021367519, memoryPolicyName=null, mode=REPLICATED, atomicity=ATOMIC, ba                    ckups=2147483647, mvcc=false], encryptionEnabled=false]
[16:40:20,888][INFO][exchange-worker-#38][GridCacheProcessor] Finish proxy initialization, cacheName=userRoleCache, localNodeId=7f470894-a979-443c-bc4f-7bcb047c7550
[16:40:20,997][INFO][exchange-worker-#38][GridCacheProcessor] Stopped cache [cacheName=userRoleCache]
[16:40:21,000][INFO][exchange-worker-#38][GridCacheProcessor] Can not finish proxy initialization because proxy does not exist, cacheName=userRoleCache, localNodeId=7f470894                    -a979-443c-bc4f-7bcb047c7550
[16:40:21,001][INFO][exchange-worker-#38][GridCacheProcessor] Can not finish proxy initialization because proxy does not exist, cacheName=userRoleCache, localNodeId=7f470894                    -a979-443c-bc4f-7bcb047c7550
[16:40:21,001][INFO][exchange-worker-#38][GridCacheProcessor] Can not finish proxy initialization because proxy does not exist, cacheName=userRoleCache, localNodeId=7f470894                    -a979-443c-bc4f-7bcb047c7550
[16:40:21,015][INFO][exchange-worker-#38][GridCacheProcessor] Started cache [name=roleAppPermissions, id=-686786119, memoryPolicyName=null, mode=REPLICATED, atomicity=ATOMIC                    , backups=2147483647, mvcc=false], encryptionEnabled=false]
[16:40:21,039][INFO][exchange-worker-#38][GridCacheProcessor] Finish proxy initialization, cacheName=roleAppPermissions, localNodeId=7f470894-a979-443c-bc4f-7bcb047c7550                        [16:40:21,063][INFO][exchange-worker-#38][GridCacheProcessor] Stopped cache [cacheName=roleAppPermissions]
[16:40:21,131][SEVERE][http-nio-5014-exec-4][JerseyConfig]] Servlet.service() for servlet [com.xxxx.xxxx.base.config.JerseyConfig] in context with path [/ba-cct-api] threw e                    xception

It's hard to say what happens here without looking at the logs. Do you have JVM pause messages in your logs? Can you share them? — alamar
Log looks OK, can you show the details of exceptions that you are catching? — alamar
The rest is a simple stack propagation of the exception. Nothing much there except message cache has been closed. I have added a check for the exception to perform the operation after re-connect. I can add those logs if necessary but they have nothing more to say. — Chetan Munigangappa
Can you provide complete log with IGNITE_QUIET set to false? Maybe there's something there about the reasons of reconnect. — alamar

Chetan Munigangappa Chetan Munigangappa · Accepted Answer · 2019-10-29T01:23:09

I have been able to finally figure out this and was traced back to our custom wrapper having a try-with-resources logic in one of the places. Since, Apache Ignite was re-opening cache connection each time, only during concurrent hits were we able to see this error.

Thank you all for the answers.

Apache Ignite Frequent Cache Close Exception

1 Answers