0
votes

I have installed Cloudera VM (Single node) and inside this VM i have Spark running on top of Yarn. I would like to use Eclipse IDE (with scala plugin) for testing/learning with Spark.

If i instantiate SparkContext as following, everything works as i expected

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._

val sparkConf = new SparkConf().setAppName("TwitterPopularTags").setMaster("local[2]")

However, if i want now to connect to local server by changing the master to 'yarn-client' then it does not work:

val master = "yarn-client"
val sparkConf = new SparkConf().setAppName("TwitterPopularTags").setMaster(master)

Specifically im getting following errors:

Error details displayed in the Eclipse console:

enter image description here

Error details from the NodeManager logs:

enter image description here

Here are the things i have tried so far: 1. Dependencies I added all the dependencies through Maven repository Cloudera version is 5.5 and corresponding Hadoop version is 2.6.0 and Spark version is 1.5.0.

2. Configurations I added 3 path variables into Eclipse classpath:

  • SPARK_CONF_DIR=/etc/spark/conf/
  • HADOOP_CONF_DIR=/usr/lib/hadoop/
  • YARN_CONF_DIR=/etc/hadoop/conf.cloudera.yarn/

Can anybody clarify me what is the problem here and ways to solve it?

1
Please check for the error details from the Yarn's JobHistory Server i.stack.imgur.com/aI4SD.pngNorman D

1 Answers

0
votes

I worked around it! I still don't understand what the exact problems is but i created a folder with my username in hadoop , i.e. /user/myusername directory and it worked. Anyway now i switched to Hortonworks distribution and i found it much more smoother to get started with than the Cloudera distribution.