Running Spark application on AWS EMR

Question

I am trying to run a Spark application on AWS EMR. I follow the instructions at http://blogs.aws.amazon.com/bigdata/post/Tx15AY5C50K70RV/Installing-Apache-Spark-on-an-Amazon-EMR-Cluster

It looks like Spark is getting installed correctly during bootstrapping. However, when my step runs I get the following error.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD
  at SparkCCF.main(SparkCCF.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.rdd.RDD
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  ... 6 more

I load the fat-jar (obtained by doing "sbt assembly") and the input files needed for the application from S3. I built my app on Spark-1.1.0 version. The EMR cluster is on AMI 3.2.1 and Hadoop 2.4.

Do we need to build the Spark app using "Prebuilt for Hadoop 2.4" or just Spark-1.1.0 would work?

alvaro.muir alvaro.muir · Accepted Answer · 2015-01-22T05:18:10

-1

votes

UC-AMP Lab's instructions are a little better.

https://github.com/amplab/spark-ec2#readme

Running Spark application on AWS EMR

1 Answers