1
votes

I have Hadoop cluster Cloudera CDH 5.2 with Apache Spark 1.5.0.

Can I run my app from IntelliJ IDEA or local PC using cluster's YARN, Spark and HDFS?

Or should I send jar via ftp to the master node, and run it through the spark-submit?

1

1 Answers

1
votes

Yes, you can run your job directly from the IDE if you follow these steps:

  1. Add spark-yarn package to your project dependencies (can be marked as provided)
  2. Add directory with hadoop configuration (HADOOP_CONF_DIR) to the project classpath
  3. Copy spark assembly jar to HDFS

Then configure spark context in your application using config:

SparkConf sparkConfig = new SparkConf().
    .setMaster("yarn-client")
    .set("spark.yarn.queue", "if_you_are_using_scheduler")
    .set("spark.yarn.jar", "hdfs:///path/to/assembly/on/hdfs");

If your Hadoop is secured deployment, there is also need to

  • change JRE to JRE with JCE enabled
  • add krb5.conf to java parameters (-Djava.security.krb5.conf=/path/to/local/krb5.conf)
  • call kinit inside your environment

I tested this solution some time ago on Spark 1.2.0 on CDH also, but it should work on 1.5. Remember, that this approach makes your local machine a spark driver so be aware of some firewalling isseus between driver and executors - your local machine should be accessible from hadoop nodes.