0
votes

When running in our scale test setup we notice that when ignite server node gets restarted a couple of ignite client nodes alone get stuck indefinitely on calls like QueryCursorImpl#iterator instead of failing by throwing ClientDisconnected or CacheStopped exception (few other client nodes do & get reconnected as we have code in place to disconnect & reconnect when this happens since there are issues with automatic reconnection WRT ignite resource handles when used in a container environment like Spring Boot by us).

From the thread dump of these services i see that a huge number of threads are stuck (parked) with the below stack trace,

stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)

The above is a partial trace that is common to all of them which originate either as cache reads or writes. Below are a few examples,

Scheduled-task-pool-9 - threadId:194 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendGeneric(GridIoManager.java:1727)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.send(IgniteH2Indexing.java:2511)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.send(GridReduceQueryExecutor.java:1419)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:732)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1339)
at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$9.iterator(IgniteH2Indexing.java:1403)
at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
at com.**.**.configuration.ClientHealthBasedReconnectWrapper.monitorHealth(ClientHealthBasedReconnectWrapper.java:102)
at sun.reflect.GeneratedMethodAccessor403.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- locked <57b75756> (a java.util.concurrent.ThreadPoolExecutor$Worker)


http-nio-7051-exec-75 - threadId:8896 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendGeneric(GridIoManager.java:1727)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.send(IgniteH2Indexing.java:2511)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.send(GridReduceQueryExecutor.java:1419)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.releaseRemoteResources(GridReduceQueryExecutor.java:1037)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:835)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1339)
at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
at org.apache.ignite.internal.processors.cache.QueryCursorImpl.getAll(QueryCursorImpl.java:114)
at com.***.***.perfmon.datastore.service.DataStoreCacheService.getCongestionPortsSummary(DataStoreCacheService.java:1826)
at com.***.***.perfmon.datastore.controller.DataStoreCacheServiceController.getCongestionPortsSummary(DataStoreCacheServiceController.java:199)
at sun.reflect.GeneratedMethodAccessor574.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:109)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:800)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:800)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1471)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
- locked <58365b7b> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- locked <209011f6> (a java.util.concurrent.ThreadPoolExecutor$Worker)


pool-2-thread-10 - threadId:42 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1166)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.proceedPrepare(GridNearOptimisticTxPrepareFuture.java:593)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepareSingle(GridNearOptimisticTxPrepareFuture.java:405)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepare0(GridNearOptimisticTxPrepareFuture.java:348)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepareOnTopology(GridNearOptimisticTxPrepareFutureAdapter.java:137)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepare(GridNearOptimisticTxPrepareFutureAdapter.java:74)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareNearTxLocal(GridNearTxLocal.java:3161)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commitNearTxLocalAsync(GridNearTxLocal.java:3221)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.optimisticPutFuture(GridNearTxLocal.java:2391)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAllAsync0(GridNearTxLocal.java:802)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAllAsync(GridNearTxLocal.java:361)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter$35.inOp(GridCacheAdapter.java:2821)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:5076)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4088)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAll0(GridCacheAdapter.java:2819)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2808)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1089)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:942)
at com.***.***.perfmon.datastore.cachehandler.FcProductStatCacheHandler.insertCache(FcProductStatCacheHandler.java:225)
at com.***.***.perfmon.datastore.updatehandler.DataStoreUpdateHandler.storeData(DataStoreUpdateHandler.java:168)
at com.***.***.perfmon.datastore.messaging.PerfmonProcessedStatsDataConsumer$2.run(PerfmonProcessedStatsDataConsumer.java:150)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- locked <613a8ee1> (a java.util.concurrent.ThreadPoolExecutor$Worker)


pool-4-thread-5 - threadId:53 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.submit(DataStreamerImpl.java:1798)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.flush(DataStreamerImpl.java:1534)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.doFlush(DataStreamerImpl.java:1074)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.closeEx(DataStreamerImpl.java:1240)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.closeEx(DataStreamerImpl.java:1211)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.close(DataStreamerImpl.java:1199)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.close(DataStreamerImpl.java:1286)
at com.***.***.perfmon.datastore.cachehandler.RawFcPortStatCacheHandler.insertCacheBulk(RawFcPortStatCacheHandler.java:282)
at com.***.***.perfmon.datastore.updatehandler.DataStoreUpdateHandler.storeData(DataStoreUpdateHandler.java:180)
at com.***.***.perfmon.datastore.messaging.PerfmonStreamingStatsDataConsumer.processAndStoreAggData(PerfmonStreamingStatsDataConsumer.java:262)
at com.***.***.perfmon.datastore.messaging.PerfmonStreamingStatsDataConsumer$2.run(PerfmonStreamingStatsDataConsumer.java:381)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- locked <6e46d9f4> (a java.util.concurrent.ThreadPoolExecutor$Worker)

In this particular setup we simply have a single server node & about 25 odd client nodes and they are all running in a docker swarm overlay network (a few of these would update a lot of caches inside of a transaction, basically open trx, acquire locks on some keys, then update several of the caches via jcache apis before closing the trx, i suspect this locking of keys to be an issue but thats a separate one for which i will ask in a different question).

Any one have any clues, suggestions or inputs on this in terms of how to avoid/work around this?

We are running version 2.4 & using Spring integration (planning to upgrade soon).

Thanks Muthu

UPDATE (10/16/18):

In the thread dumps of one of the two stuck client nodes, i see this stuck thread consistently which on looking at the code looks to be like the cause of the other threads getting stuck although i do not see this on the other client node's thread dumps. Could this be a problem?

"tcp-client-disco-msg-worker-#4" - Thread t@127
   java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
        at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
        at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
        at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714)
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1166)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.removeLocks(GridDhtColocatedCache.java:859)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.undoLocks(GridDhtColocatedLockFuture.java:389)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onComplete(GridDhtColocatedLockFuture.java:586)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onDone(GridDhtColocatedLockFuture.java:565)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onDone(GridDhtColocatedLockFuture.java:90)
        at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
        at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.cancelClientFutures(GridCacheMvccManager.java:386)
        at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onDisconnected(GridCacheMvccManager.java:378)
        at org.apache.ignite.internal.processors.cache.GridCacheSharedContext.onDisconnected(GridCacheSharedContext.java:343)
        at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onDisconnected(GridCacheProcessor.java:1036)
        at org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3793)
        at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:779)
        at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:576)
        - locked <fb11fd7> (a java.lang.Object)
        at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2414)
        at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2393)
        at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1709)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)

   Locked ownable synchronizers:
        - None
1

1 Answers

1
votes

I don't think that's normal.

What you have here is basically Client tries to connect to restarted Server node via Communication, getting "I don't know who you are" responses and retrying.

What should happen is that Client should disconnect from topology and reconnect a new one via Discovery before trying Communication again. Do you have full logs from client?

Since it does not look like normal behavior, I'd try 2.6 or the upcoming 2.7, see if it is any better.