0
votes

We are running version 2.4 & using Spring IgniteSpringBean & Spring Data repositories for cluster & cache access.

Since we have been having a lot of IgniteClientDisconnectedException related issues, i am writing a manual segmentation resolver (by disabling automatic client reconnection with clientReconnectDisabled set to true) which would detect this condition (using a simple cache query that runs periodically) & initiate a disconnect via IgniteSpringBean#close followed by a reconnect with the below code fragment (very similar to this discussion, http://apache-ignite-users.70518.x6.nabble.com/SPI-has-already-been-started-always-create-new-configuration-instance-for-each-starting-Ignite-instar-td7360.html),

Code fragment in bean DCMIgniteSpringBean#reconnect() referenced below in XML config:

public final void reconnect(final IgniteConfiguration specifiedIgniteConfiguration) {
  LOGGER.info("Initiating reconnect..");
  try {
    close();
    //destroy();
  } catch (Exception e) {
    LOGGER.warn("Error while disconnecting", e);
  }
  LOGGER.info("Disconnected..");
  try {
    Thread.sleep(1000);
  } catch (Exception e) {
    LOGGER.warn("Error while pausing to reconnect", e);
  }
  setConfiguration(specifiedIgniteConfiguration);
  afterSingletonsInstantiated();
  final CacheConfiguration[] cfgArray = specifiedIgniteConfiguration.getCacheConfiguration();
  LOGGER.info("Cache configuration is : {}", cfgArray);
  getOrCreateCaches(Arrays.asList(cfgArray));
  LOGGER.info("Reconnected..");
}

The XML bean config fragment:

<bean id="igniteInstance" class="com.brocade.dcm.configuration.DCMIgniteSpringBean">
        <property name="configuration" ref="grid.cfg"/>
</bean>
<bean id="grid.cfg.provider" class="com.brocade.dcm.configuration.ClientHealthBasedReconnectWrapper">
        <lookup-method name="createIgniteConfiguration" bean="grid.cfg"/>
</bean>
<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration" scope="prototype">
...
...
</bean>

With the above i got this to work & see that my extended IgniteSpringBean client reconnects properly & starts all the caches as-well.

However the problem is even though the client is connected & the caches are started all subsequent calls/queries to any of the IgniteCache & IgniteRepository instances fail with CacheStoppedException (below) & are rendered unusable.

Can someone suggest what i could do to refresh these references. I know that when the client reconnects automatically post a disconnect the references continue to work fine which tells me there is a way to refresh them & that i am not doing it.

Any expert ideas on how to achieve this...feels like i am close but still far given that i am doing hacks :-(

Below are the exceptions i get for IgniteCache#query() & IgniteRepository#findByXXX() calls respectively,

class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): FabricInfoCache
    at org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:164)
    at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1684)
    at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:365)
    at com.brocade.dcm.configuration.ClientHealthBasedReconnectWrapper.monitorHealth(ClientHealthBasedReconnectWrapper.java:110)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

.

[Request processing failed; nested exception is java.lang.IllegalStateException: class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): WebsocketCacheInfo] with root cause
class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): WebsocketCacheInfo
    at org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:164)
    at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1684)
    at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:365)
    at org.apache.ignite.springdata.repository.query.IgniteRepositoryQuery.execute(IgniteRepositoryQuery.java:117)
    at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.doInvoke(RepositoryFactorySupport.java:483)
    at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.invoke(RepositoryFactorySupport.java:461)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.data.projection.DefaultMethodInvokingMethodInterceptor.invoke(DefaultMethodInvokingMethodInterceptor.java:61)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.data.repository.core.support.SurroundingTransactionDetectorMethodInterceptor.invoke(SurroundingTransactionDetectorMethodInterceptor.java:57)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
    at com.sun.proxy.$Proxy182.findByWebsocketSessionId(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
    at com.sun.proxy.$Proxy124.findByWebsocketSessionId(Unknown Source)

Thanks Muthu

2
Just to clarify better, I see that if i fetch new instances of the same IgniteCache with a IgniteSpringBean#cache(<name>) it works fine, but the references from before to disconnection do not work post the reconnect. This wont help because we can't do this to every cache & repository reference in the system..lmk

2 Answers

1
votes

I believe this should be fixed in 2.5:

https://issues.apache.org/jira/browse/IGNITE-2766

Please try this version.

0
votes

For others facing this issues, i fixed the problem by building from source & fixing the code in GatewayProtectedCacheProxy#checkProxyIsValid & GridCacheContext.

Special thanks to @Michael for sharing the related issue which helped get to this solution.

Basically i see that when ignite is stopped & restarted the wrapped cache proxy references (4 IgniteCache/IgniteRepository) that have been served before have their kernel context going stale as the kernel is stopped & restarted with a new instance. The (spring) application has these references (from various injections) & their subsequent calls with them fail. The fix was to check if there is an existing running kernel instance/reference for the same ignite instance name & if so update the proxy references if a cache with the same name has been started & available.

private GridCacheGateway<K, V> checkProxyIsValid(@Nullable GridCacheGateway<K, V> gate, boolean tryRestart) {
..
..
  if (isCacheProxy && tryRestart && gate.isStopped() &&
                context().kernalContext().gateway().getState() == GridKernalState.STOPPED) {
            IgniteKernal igniteKernal = (IgniteKernal) Ignition.ignite(context().gridConfig().getIgniteInstanceName());
            if(igniteKernal != null) {
             context().setGridKernalContext(igniteKernal.context());
            }
   }
  if (isCacheProxy && tryRestart && gate.isStopped() &&
            context().kernalContext().gateway().getState() == GridKernalState.STARTED) {
            IgniteCacheProxyImpl proxyImpl = (IgniteCacheProxyImpl) delegate;

            try {
                IgniteInternalCache<K, V> cache = context().kernalContext().cache().<K, V>publicJCache(context().name()).internalProxy();

                GridFutureAdapter<Void> fut = proxyImpl.opportunisticRestart();

                if (fut == null)
                    proxyImpl.onRestarted(cache.context(), cache.context().cache());
                else
                    new IgniteFutureImpl<>(fut).get();

                return gate();
            } catch (IgniteCheckedException ice) {
                // Opportunity didn't work out.
            }
        }

        return gate;
  }

     /**
     * NOTE : This method goes into GridCacheContext.java
     * @param ctx
     */
    public void setGridKernalContext(GridKernalContext ctx) {
        this.ctx = ctx;
    }