1
votes

I'm trying to deploy Apache Spark Pi example from Eclipse to Hadoop YARN. I'm running my own cluster with 3 virtual machines with linux. Hadoop version in cluster is 2.7.2 and Spark is 1.6.0 with pre-build for Hadoop 2.6.0 an later. I was able to run Pi example from node, but when I want to run java Pi example from eclipse on windows (yarn cluster mode) I get error shown below. I found several threads with this error, but most of them were for cloudera or hortonwork with some extra variables or did not solve my problem. I also tried YARN client mode with same results. Can somebody help me, pleas?

Eclipse console output:

16/02/23 11:21:51 INFO Client: Requesting a new application from cluster with 1 NodeManagers
16/02/23 11:21:51 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/02/23 11:21:51 INFO Client: Will allocate AM container, with 1384 MB memory including 384 MB overhead
16/02/23 11:21:51 INFO Client: Setting up container launch context for our AM
16/02/23 11:21:51 INFO Client: Setting up the launch environment for our AM container
16/02/23 11:21:51 INFO Client: Preparing resources for our AM container
16/02/23 11:21:53 WARN : Your hostname, uherpc resolves to a loopback/non-reachable address: 172.25.32.214, but we couldn't find any external IP address!
16/02/23 11:21:53 INFO Client: Uploading resource file:/C:/Users/xuherv00/.gradle/caches/modules-2/files-2.1/org.apache.spark/spark-yarn_2.10/1.6.0/ace7b1f6f0c33b48e0323b7b0e7dd8ab458c14a4/spark-yarn_2.10-1.6.0.jar -> hdfs://sparkmaster:9000/user/hduser/.sparkStaging/application_1456222391080_0002/spark-yarn_2.10-1.6.0.jar
16/02/23 11:21:54 INFO Client: Uploading resource file:/C:/Users/xuherv00/workspace/rapidminer5/RapidMiner_Extension_Streaming/lib/spark-examples-1.6.0-hadoop2.6.0.jar -> hdfs://sparkmaster:9000/user/hduser/.sparkStaging/application_1456222391080_0002/spark-examples-1.6.0-hadoop2.6.0.jar
16/02/23 11:22:02 INFO Client: Uploading resource file:/C:/Users/xuherv00/AppData/Local/Temp/spark-266eade5-5049-4b13-9f75-edb5200e3df1/__spark_conf__6296221515875913107.zip -> hdfs://sparkmaster:9000/user/hduser/.sparkStaging/application_1456222391080_0002/__spark_conf__6296221515875913107.zip
16/02/23 11:22:02 INFO SecurityManager: Changing view acls to: xuherv00,hduser
16/02/23 11:22:02 INFO SecurityManager: Changing modify acls to: xuherv00,hduser
16/02/23 11:22:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xuherv00, hduser); users with modify permissions: Set(xuherv00, hduser)
16/02/23 11:22:02 INFO Client: Submitting application 2 to ResourceManager
16/02/23 11:22:02 INFO YarnClientImpl: Submitted application application_1456222391080_0002
16/02/23 11:22:03 INFO Client: Application report for application_1456222391080_0002 (state: ACCEPTED)
16/02/23 11:22:03 INFO Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1456222434780
     final status: UNDEFINED
     tracking URL: http://sparkmaster:8088/proxy/application_1456222391080_0002/
     user: hduser
16/02/23 11:22:04 INFO Client: Application report for application_1456222391080_0002 (state: ACCEPTED)
16/02/23 11:22:05 INFO Client: Application report for application_1456222391080_0002 (state: ACCEPTED)
16/02/23 11:22:06 INFO Client: Application report for application_1456222391080_0002 (state: ACCEPTED)
16/02/23 11:22:07 INFO Client: Application report for application_1456222391080_0002 (state: ACCEPTED)
16/02/23 11:22:08 INFO Client: Application report for application_1456222391080_0002 (state: ACCEPTED)
16/02/23 11:22:09 INFO Client: Application report for application_1456222391080_0002 (state: ACCEPTED)
16/02/23 11:22:10 INFO Client: Application report for application_1456222391080_0002 (state: FAILED)
16/02/23 11:22:10 INFO Client: 
     client token: N/A
     diagnostics: Application application_1456222391080_0002 failed 2 times due to AM Container for appattempt_1456222391080_0002_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://sparkmaster:8088/cluster/app/application_1456222391080_0002Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1456222391080_0002_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
    at org.apache.hadoop.util.Shell.run(Shell.java:456)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1456222434780
     final status: FAILED
     tracking URL: http://sparkmaster:8088/cluster/app/application_1456222391080_0002
     user: hduser
16/02/23 11:22:10 INFO Client: Deleting staging directory .sparkStaging/application_1456222391080_0002
16/02/23 11:22:10 INFO ShutdownHookManager: Shutdown hook called
16/02/23 11:22:10 INFO ShutdownHookManager: Deleting directory C:\Users\xuherv00\AppData\Local\Temp\spark-266eade5-5049-4b13-9f75-edb5200e3df1

Hadoop application log:

User:   hduser
Name:   testApp
Application Type:   SPARK
Application Tags:   
YarnApplicationState:   FAILED
FinalStatus Reported by AM:     FAILED
Started:    Tue Feb 23 11:13:54 +0100 2016
Elapsed:    8sec
Tracking URL:   History
Diagnostics:    
Application application_1456222391080_0002 failed 2 times due to AM Container for appattempt_1456222391080_0002_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://sparkmaster:8088/cluster/app/application_1456222391080_0002Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1456222391080_0002_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.

Logs for container_1456222391080_0002_01_000001 stderr:

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster

Gradle dependencies:

    //hadoop
compile 'org.apache.hadoop:hadoop-common:2.7.2'
compile 'org.apache.hadoop:hadoop-hdfs:2.7.2'
compile 'org.apache.hadoop:hadoop-client:2.7.2'
compile 'org.apache.hadoop:hadoop-mapreduce-client-core:2.7.2'
compile 'org.apache.hadoop:hadoop-yarn-common:2.7.2'
compile 'org.apache.hadoop:hadoop-yarn-api:2.7.2'

//spark
compile 'org.apache.spark:spark-core_2.10:1.6.0'
compile 'org.apache.spark:spark-sql_2.10:1.6.0'
compile 'org.apache.spark:spark-streaming_2.10:1.6.0'
compile 'org.apache.spark:spark-catalyst_2.10:1.6.0'
compile 'org.apache.spark:spark-yarn_2.10:1.6.0'
compile 'org.apache.spark:spark-network-shuffle_2.10:1.6.0'
compile 'org.apache.spark:spark-network-common_2.10:1.6.0'
compile 'org.apache.spark:spark-network-yarn_2.10:1.6.0'

Java class:

package mypackage;

import java.io.File;
import java.lang.reflect.Method;
import java.net.URI;
import java.net.URL;
import java.net.URLClassLoader;

import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkConf;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.junit.Test;

public class ExampleApp {
    private String appName = "testApp";
//  private String mode = "yarn-client";
    private String mode = "yarn-cluster";
    private File appJar = new File("lib/spark-examples-1.6.0-hadoop2.6.0.jar");
    private URI appJarUri = appJar.toURI();
    private String hadoopPath = "E:\\store\\hadoop";

    @Test
    public void deployPiToYARN() {
        String[] args = new String[] {
                // the name of your application
                "--name", appName,

                // memory for driver (optional)
                "--driver-memory", "1000M",

                // path to your application's JAR file
                // required in yarn-cluster mode
                "--jar", appJarUri.toString(),

                // name of your application's main class (required)
                "--class", "org.apache.spark.examples.SparkPi",

                // comma separated list of local jars that want
                // SparkContext.addJar to work with
//              "--addJars",
//              "lib/spark-assembly-1.6.0-hadoop2.6.0.jar",

                // argument 1 to Spark program
                 "--arg",
                 "10",
        };

        System.setProperty("hadoop.home.dir", hadoopPath);

        System.setProperty("HADOOP_USER_NAME", "hduser");
        try {
            addHadoopConfToClassPath(hadoopPath);
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        // create a Hadoop Configuration object
        Configuration config = new Configuration();
        config.set("yarn.resourcemanager.address", "172.25.32.192:8050");

        // identify that you will be using Spark as YARN mode
        System.setProperty("SPARK_YARN_MODE", "true");

        // create an instance of SparkConf object
        SparkConf sparkConf = new SparkConf().setAppName(appName);
        sparkConf = sparkConf.setMaster(mode);
        sparkConf = sparkConf.set("spark.executor.memory","1g");

        // create ClientArguments, which will be passed to Client
        ClientArguments cArgs = new ClientArguments(args, sparkConf);
        // create an instance of yarn Client client
        Client client = new Client(cArgs, config, sparkConf);

//      client.submitApplication();
        // submit Spark job to YARN
        client.run();
    }

    private void addHadoopConfToClassPath(String path) throws Exception {
        File f = new File(path);
        URL u = f.toURI().toURL();
        URLClassLoader urlClassLoader = (URLClassLoader) ClassLoader.getSystemClassLoader();
        Class<URLClassLoader> urlClass = URLClassLoader.class;
        Method method = urlClass.getDeclaredMethod("addURL", new Class[]{URL.class});
        method.setAccessible(true);
        method.invoke(urlClassLoader, new Object[]{u});
    }
}

core-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://sparkmaster:9000</value>
    </property>
</configuration>

hdfs-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
        </property>
</configuration>

yarn-site.xml:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>sparkmaster</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>sparkmaster:8025</value>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>sparkmaster:8035</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>sparkmaster:8050</value>
    </property>
</configuration>
1

1 Answers

1
votes

You should not launch spark on yarn from eclipse rather use SparkSubmit. Although you can use local mode from eclipse.

SparkSubmit does many things for you including uploading dependencies like spark jars to yarn cluster which will be referenced by the executors. Thats why you are getting above errors.