I have Hadoop cluster Cloudera CDH 5.2 with Apache Spark 1.5.0.
Can I run my app from IntelliJ IDEA or local PC using cluster's YARN, Spark and HDFS?
Or should I send jar via ftp to the master node, and run it through the spark-submit?
Yes, you can run your job directly from the IDE if you follow these steps:
spark-yarn
package to your project dependencies (can be marked as provided
) Then configure spark context in your application using config:
SparkConf sparkConfig = new SparkConf().
.setMaster("yarn-client")
.set("spark.yarn.queue", "if_you_are_using_scheduler")
.set("spark.yarn.jar", "hdfs:///path/to/assembly/on/hdfs");
If your Hadoop is secured deployment, there is also need to
krb5.conf
to java parameters (-Djava.security.krb5.conf=/path/to/local/krb5.conf
) kinit
inside your environmentI tested this solution some time ago on Spark 1.2.0 on CDH also, but it should work on 1.5. Remember, that this approach makes your local machine a spark driver so be aware of some firewalling isseus between driver and executors - your local machine should be accessible from hadoop nodes.