0
votes

We have an apache ignite cluster in AKS. We have setup ignite cluster with 3 nodes. I can see that the 3 nodes are shown in the sys.nodes and sys.baseline_nodes. Below is cache configuration for 1 node -

    <property name="discoverySpi">
        <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
            <property name="ipFinder">
                <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder">
                    <property name="namespace" value="ignite"/>
                    <property name="serviceName" value="ignite-service"/>
                </bean>
            </property>
        </bean>
    </property>
    
    <property name="addressResolver">
            <bean class="org.apache.ignite.configuration.BasicAddressResolver">
                <constructor-arg>
                    <map>
                        <entry key="127.0.0.1" value="52.2XX.X.XX"></entry>
                    </map>
                </constructor-arg>
            </bean>
        </property>

But why are log files filled with below messgages?

[16:56:11,649][SEVERE][grid-nio-worker-tcp-comm-0-#23][TcpCommunicationSpi] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0, bytesRcvd=42792440, bytesSent=867699, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=null, finished=false, heartbeatTs=1610470568808, hashCode=1382623580, interrupted=false, runner=grid-nio-worker-tcp-comm-0-#23]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, outRecovery=null, closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1, super=GridNioSessionImpl [locAddr=/10.244.2.21:47100, rmtAddr=/10.240.0.5:44211, createTime=1610470564646, closeTime=0, bytesSent=18, bytesRcvd=0, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1610470564646, lastSndTime=1610470564646, lastRcvTime=1610470564646, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser@f7f9f74, directMode=true], GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1330) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2472) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2239) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1880) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) [16:56:11,688][SEVERE][tcp-disco-sock-reader-[]-#12734-#14234][TcpDiscoverySpi] Failed to initialize connection (this can happen due to short time network problems and can be ignored if does not affect node discovery) [sock=Socket[addr=/10.240.0.5,port=10858,localport=47500]] java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at org.apache.ignite.spi.discovery.tcp.ServerImpl$SocketReader.body(ServerImpl.java:6757) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)

1
If you are running in plain AKS then an addressResolver is not needed? Follow these instructions: ignite.apache.org/docs/latest/installation/kubernetes/…Alex K
We connect through spark for data ingestion and since spark worker nodes are thick client nodes they need to be provided with public ips of nodes to be able to join cluster. Spark is in databricks outside AKS cluster. We followed this - gridgain.com/docs/latest/installation-guide/azure/… and actually we are using 3 AKS clusters each having one node joined using LB ip to form one ignite cluster. Is there any better alternative that can solve this problem and the cluster is also created? Does these logs affect performance or stability of cluster?user2010872

1 Answers

0
votes

Looks like your network is closing inactive connections, hence "Connection reset by peer".

Apache Ignite can't affect connection resets but it will work around by re-establishing connections.