0
votes

We have tried to submit a job in Spark on Yarn, which will import data from HDFS to Apache Ignite. So, we need to specify the Ignite configuration file path for Spark containers.

The examples on the Ignite website only define the path like "conf/cache.xml" then Spark drivers and executors "magically" find the file, but I don't understand how Spark executors find it.

We have tried several ways, none worked:

  • Specify a full path like "file:///disk1/conf/cache.xml" in the code

  • Upload the config file to HDFS and specify it like "hdfs:///hdfs_root/conf/cache.xml"

  • Specify the full path in spark-defaults.conf, in the parameters spark.{driver,executor}.extraClassPath

Do we have to put the Ignite config file in every Yarn node for Ignite to work with Spark on Yarn ? Is there any better approach for this ?

1

1 Answers

2
votes

I am not sure why Ignite can't read the configuration using "file:///disk1/conf/cache.xml" or "hdfs:///hdfs_root/conf/cache.xml". Possible that it could be the issue and should be investigated.

However, you still can try to use the dynamic configuration like:

    public static IgniteConfiguration getClientConfiguration(String igniteInstanceName) {
        IgniteConfiguration cfg = new IgniteConfiguration();

        if (igniteInstanceName != null) {
            cfg.setIgniteInstanceName(igniteInstanceName);
            cfg.setConsistentId(igniteInstanceName);
        }


 cfg.setWorkDirectory(FileSystems.getDefault().getPath(".").toAbsolutePath().toString());

        cfg.setClientMode(true);

        TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
        List<String> addrs = Arrays.asList("10.0.75.1:47500..47509");
        ipFinder.setAddresses(addrs);
        TcpDiscoverySpi discoSpi = new TcpDiscoverySpi();
        discoSpi.setIpFinder(ipFinder);
        cfg.setDiscoverySpi(discoSpi);

        return cfg;
    }

Also, you can put the XML configuration into the jar archive.

In case if you are going to use the Ignite RDD then next example should work with cases above:

1)Dynamic configuration:

JavaIgniteContext<Long, Record> igniteContext = new JavaIgniteContext<>(
    sparkCtx, (IgniteOutClosure<IgniteConfiguration>)() -> {
        try {
            return IngniteConfigurationProvider.getClientConfiguration("ClientNode");
        }
        catch (Exception e) {
            return null;
        }
    });

2)Using the config from jar:

JavaIgniteContext<Long, Record> igniteContext = new JavaIgniteContext<>(
    sparkCtx, "client_config.xml");