0
votes

So I'm trying to setup an Ignite cluster with this default-config.xml for both nodes:

<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans
                        http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="workDirectory" value="/mnt/e/apache-ignite"/>
    <property name="discoverySpi">
        <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
            <property name="ipFinder">
                <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
                    <property name="addresses">
                        <list>
                            <value>node1:47500..47509</value>
                            <value>node2:47500..47509</value>
                        </list>
                    </property>
                </bean>
            </property>
        </bean>
    </property>
    <property name="communicationSpi">
        <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
            <property name="localPort" value="47100"/>
        </bean>
    </property>
    <property name="fileSystemConfiguration">
        <list>
            <bean class="org.apache.ignite.configuration.FileSystemConfiguration">
                <property name="name" value="igfs"/>
                <property name="ipcEndpointConfiguration">
                    <bean class="org.apache.ignite.igfs.IgfsIpcEndpointConfiguration">
                        <property name="type" value="TCP"/>
                        <property name="host" value="0.0.0.0"/>
                        <property name="port" value="10500"/>
                    </bean>
                </property>
                <property name="secondaryFileSystem">
                    <bean class="org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem">
                        <property name="fileSystemFactory">
                            <bean class="org.apache.ignite.hadoop.fs.CachingHadoopFileSystemFactory">
                                <property name="uri" value="hdfs://node1:9000/"/>
                                <property name="configPaths">
                                    <list>
                                        <value>/mnt/e/hadoop/etc/hadoop/core-site.xml</value>
                                    </list>
                                </property>
                            </bean>
                        </property>
                    </bean>
                </property>
            </bean>
        </list>
    </property>
</bean>

I can start both nodes separately without any problems using ignite.sh. But when I try to join both nodes I keep getting following error:

    class org.apache.ignite.IgniteException: Failed to start manager: GridManagerAdapter [enabled=true,  name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
        at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1067)
        at org.apache.ignite.Ignition.start(Ignition.java:349)
        at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start manager:    GridManagerAdapter [enabled=true,   name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
        at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1965)
        at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1276)
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2045)
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1703)
        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1117)
        at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1035)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:921)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:820)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:659)
        at org.apache.ignite.Ignition.start(Ignition.java:346)
        ... 1 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@1b5c3e5f], reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false]
        at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302)
        at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:943)
        at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1960)
        ... 11 more
   Caused by: class org.apache.ignite.spi.IgniteSpiException: Impossible to continue join, check if local discovery and communication ports are not blocked with firewall [addr=OmUsVdiDist0221/192.168.175.221:47500, req=TcpDiscoveryJoinRequestMessage [node=TcpDiscoveryNode [id=98e6971f-b477-4518-a25d-1d8ff8a33c46, consistentId=0:0:0:0:0:0:0:1%lo,127.0.0.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1], sockAddrs=HashSet [/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=0, intOrder=0, lastExchangeTime=1602083346717, loc=true, ver=2.8.1#20200521-sha1:86422096, isClient=false], dataPacket=org.apache.ignite.spi.discovery.tcp.internal.DiscoveryDataPacket@2b59501e, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=3a6cb930571-98e6971f-b477-4518-a25d-1d8ff8a33c46, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], discoLocalPort=47500, discoLocalPortRange=100]
        at org.apache.ignite.spi.discovery.tcp.ServerImpl.sendJoinRequestMessage(ServerImpl.java:1292)
        at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1032)
        at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:427)
        at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2099)
        at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299)
        ... 13 more
Failed to start grid: Failed to start manager: GridManagerAdapter [enabled=true, name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]

I'm currently using the latest version of Ignite (2.8.1) and two Windows machines (running Ignite on wsl). I don't think a firewall is blocking the discovery or communication ports because using telnet when one of the nodes is started works without any problems.

administrator@node1:~$ telnet <node2-ip> 47100
Trying <node2-ip>...
Connected to <node2-ip>.
Escape character is '^]'.
0[GOi#^CConnection closed by foreign host.
administrator@node1:~$ telnet <node2-ip> 47500
Trying <node2-ip>...
Connected to <node2-ip>.
Escape character is '^]'.
Connection closed by foreign host.

I'm a little lost here. Maybe I'm doing something wrong in my configuration ?

EDIT

When running ignite.bat on the second worker using powershell, the node is added to the topology without any problems.

1
You're doing telnet from node1 to node1. What happens if you telnet from node1 to node2?alamar
My bad, I edited out the actual ip addresses and changed it all to node1. But I'm actually telnet from node1 to node2. I'll edit my question. Thanks for the remark.RudyVerboven
Looks like node1 was able to establish connection to node2, but at the same time node2 is unable to ping node1. Can you check that it's possible to establish connection in both directions between the nodes?Igor Belyakov
Yes ping from node2 to node1 is possible. Also when using powershell everything works just fine. But it would be better for my case that I run both nodes from inside the WSL.RudyVerboven

1 Answers

0
votes

Try setting the correct external localAddress in both nodes' discovery and communication SPI's.

Otherwise, it seems that 98e6971f node will only acknowledge its localhost address.