
I am creating a JAR using eclipse/Maven and running it on EMR

Here is my pom.xml file

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">







        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->

        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->

        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->


            <!-- Maven Assembly Plugin -->
                        <phase>package</phase> <!-- packaging phase -->

This is how I deploy and run my jar in EMR cluster

spark-submit --deploy-mode cluster --class financialLineItem.FinancialLineItem s3://path/SparkApplication-SQL-jar-with-dependencies.jar

When i run my code in zeppelin note book it runs fine, but in spark-submit it throws the below exception

Exception in thread "main" java.lang.ClassNotFoundException: financialLineItem.FinancialLineItem

This is how my project set-up looks like: enter image description here

How to correct this?

Also i have followed below document to create spark and submit in EMR https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html Here they are neither setting master URL in the spark job configuration or while submitting from the spark-submit


2 Answers


You are missing sourceDirectory in your pom.xml.

According to the Maven docs for Standard Directory Layout, the default sourceDirectory is src/main/java and since your Project structure is src/main/scala, the classes are not getting compiled.

Add this under your build configuration:


                <!-- If you have classpath issue like NoDefClassError,... -->
                <!-- useManifestOnlyJar>false</useManifestOnlyJar -->
        Maven Assembly Plugin

When you run spark-submit in the cluster mode, what happens is the driver runs on a different machine than the client, so the jar which you have provided in the spark-submit script needs to be placed on the driver's class path like this:-

--driver-class-path s3://path/SparkApplication-SQL-jar-with-dependencies.jar

so you can try the below script as follows:

spark-submit --deploy-mode cluster --class financialLineItem.FinancialLineItem --driver-class-path s3://path/SparkApplication-SQL-jar-with-dependencies.jar