2
votes

So I have configured my Apache Ignite.NET instance to run as server:

var cfg = new IgniteConfiguration
        {
            CommunicationSpi = new TcpCommunicationSpi
            {
                LocalPort = config.CommunicationPort,
                LocalPortRange = config.CommunicationPortRange,
                MaxConnectTimeout = TimeSpan.FromMilliseconds(10000),
                ConnectTimeout = TimeSpan.FromMilliseconds(1000)
            },
            AutoGenerateIgniteInstanceName = true,
            ClientMode = false,
            IsActiveOnStart = true,
            DiscoverySpi = new TcpDiscoverySpi
            {
                LocalPort = config.DiscoveryPort,
                LocalPortRange = config.DiscoveryPortRange,
                ForceServerMode = true,
                LocalAddress = localAddress,
                IpFinder = new TcpDiscoveryStaticIpFinder
                {
                    Endpoints = config.ClusterEndPoints
                }
            },
            Localhost = config.LocalAddress,
        };

I use the ForceServerMode = true and in the DiscoverySpi.Endpoints I have my local ip along with a list of IP of my cluster.

What I'm seeing is that for some reason the Join calls by ignite timeout. Here's the exception log I get:

Level: [Error], Message:[Exception on direct send: connect timed out] Native:[java.net.SocketTimeoutException: connect timed out
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1376)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1339)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.sendMessageDirectly(ServerImpl.java:1159)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.sendJoinRequestMessage(ServerImpl.java:1006)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:851)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:358)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1834)
at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:837)
at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1770)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:977)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1896)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1648)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1076)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:574)
at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:48)
at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:76)

]

So that's fine, maybe there is some network issue, partitioning, firewall etc.. I can figure that out.

What I don't understand is why does the call to start ingite node hang. I expect it to try to connect to those endpoints and if not able to, it should just start local node. Here's how I start my node

Ignition.Start(cfg);

Instead what I see is that it keeps trying to join those timeout logs are written, and it never stops and the application hangs indefinitely.

I am missing some configuration to make Ignite give up trying to connect and just start local mode, or just fail altogether.

[Edit] This only happens when I already have other apps with ignite running in a cluster and this new node tries to join the existing cluster via static ips (and it's VM has a bad network config which prevents it from talking to the existing cluster). If I try to start this new node and there are no ignite instances already running, it does NOT hang, it just goes ahead and starts local ignite node.

1
So I dug deeper into the logs and I do see that Ignite reports that the local node is initialized: Here's the log: Local node initialized: TcpDiscoveryNode [id=1a55b46a-c270-4450-b902-eb5fd28906bc, addrs=[10.211.55.3], sockAddrs=[/10.211.55.3:49100], discPort=49100, order=0, intOrder=0, lastExchangeTime=1504661932964, loc=true, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false But the call in my .NET app to Ignitiion.Start(cfg) never comes backDKhanaf
Finally I also see this log: [Warn], Message:[Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]] Native:[] So I guess the question is, how do I get the join process to not block my application?DKhanaf
I'm not sure if the problem is with your particular configuration or with Ignite.NET itself. Are you able to start nodes with default config? Run examples?Pavel Tupitsyn
@PavelTupitsyn So I am able to run ignite just fine when the VM network configurations allow for communication between host VMs of my ignite instances. However, if there is some kind of a network issue, the node thats trying to start, hangs instead of starting a local node without joining the cluster. The issue I have with this, is that anytime i have a potential for my app to hang on start I'm not a happy camper. If there are not network issues, the new node joins the cluster and ignite works as expected.DKhanaf

1 Answers