I am trying to run a Spark application on AWS EMR. I follow the instructions at http://blogs.aws.amazon.com/bigdata/post/Tx15AY5C50K70RV/Installing-Apache-Spark-on-an-Amazon-EMR-Cluster
It looks like Spark is getting installed correctly during bootstrapping. However, when my step runs I get the following error.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD
at SparkCCF.main(SparkCCF.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.rdd.RDD
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 6 more
I load the fat-jar (obtained by doing "sbt assembly") and the input files needed for the application from S3. I built my app on Spark-1.1.0 version. The EMR cluster is on AMI 3.2.1 and Hadoop 2.4.
Do we need to build the Spark app using "Prebuilt for Hadoop 2.4" or just Spark-1.1.0 would work?