So I have configured my Apache Ignite.NET instance to run as server:
var cfg = new IgniteConfiguration
{
CommunicationSpi = new TcpCommunicationSpi
{
LocalPort = config.CommunicationPort,
LocalPortRange = config.CommunicationPortRange,
MaxConnectTimeout = TimeSpan.FromMilliseconds(10000),
ConnectTimeout = TimeSpan.FromMilliseconds(1000)
},
AutoGenerateIgniteInstanceName = true,
ClientMode = false,
IsActiveOnStart = true,
DiscoverySpi = new TcpDiscoverySpi
{
LocalPort = config.DiscoveryPort,
LocalPortRange = config.DiscoveryPortRange,
ForceServerMode = true,
LocalAddress = localAddress,
IpFinder = new TcpDiscoveryStaticIpFinder
{
Endpoints = config.ClusterEndPoints
}
},
Localhost = config.LocalAddress,
};
I use the ForceServerMode = true and in the DiscoverySpi.Endpoints I have my local ip along with a list of IP of my cluster.
What I'm seeing is that for some reason the Join calls by ignite timeout. Here's the exception log I get:
Level: [Error], Message:[Exception on direct send: connect timed out] Native:[java.net.SocketTimeoutException: connect timed out
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1376)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1339)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.sendMessageDirectly(ServerImpl.java:1159)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.sendJoinRequestMessage(ServerImpl.java:1006)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:851)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:358)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1834)
at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:837)
at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1770)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:977)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1896)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1648)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1076)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:574)
at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:48)
at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:76)
]
So that's fine, maybe there is some network issue, partitioning, firewall etc.. I can figure that out.
What I don't understand is why does the call to start ingite node hang. I expect it to try to connect to those endpoints and if not able to, it should just start local node. Here's how I start my node
Ignition.Start(cfg);
Instead what I see is that it keeps trying to join those timeout logs are written, and it never stops and the application hangs indefinitely.
I am missing some configuration to make Ignite give up trying to connect and just start local mode, or just fail altogether.
[Edit] This only happens when I already have other apps with ignite running in a cluster and this new node tries to join the existing cluster via static ips (and it's VM has a bad network config which prevents it from talking to the existing cluster). If I try to start this new node and there are no ignite instances already running, it does NOT hang, it just goes ahead and starts local ignite node.
Local node initialized: TcpDiscoveryNode [id=1a55b46a-c270-4450-b902-eb5fd28906bc, addrs=[10.211.55.3], sockAddrs=[/10.211.55.3:49100], discPort=49100, order=0, intOrder=0, lastExchangeTime=1504661932964, loc=true, ver=2.1.0#20170720-sha1:bdaeecca, isClient=false
But the call in my .NET app toIgnitiion.Start(cfg)
never comes back – DKhanaf[Warn], Message:[Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]] Native:[]
So I guess the question is, how do I get the join process to not block my application? – DKhanaf