1
votes

I'm trying to create an emr spark cluster with a single custom step. The cluster is created successfully however, the step is not correctly defined.

UPDATE

I tried to lunch the same cluster via the web console and get the same results. While I specify the Jar location when I save the step the JAR location is set to command-runner.jar and the provided JAR path is added to the Arguments list.

CLI Command:

aws emr create-cluster --name 'emr-test' \
--applications Name=Spark \
--release-label emr-5.11.1 \
--auto-terminate \
--instance-type m3.xlarge \
--instance-count 1 \
--ec2-attributes SubnetId=subnet-000000 \
--steps '[{
    "Type": "SPARK",
    "Name": "spark-program",
    "Args": ["--class","--init-keyspaces"],
    "Jar": "s3://mybucket/snapshots/0.1.0-SNAPSHOT/2.11/my-spark-assembly-0.1.0-SNAPSHOT.jar",
    "ActionOnFailure": "TERMINATE_CLUSTER",
    "MainClass":"com.myspark.data.consumers.jobs.MyJob"
}]' \
--use-default-roles \
--log-uri 's3://mybucket/logs' \
--tags Name='spark-program' Environment='test'

Result:

When I check under the Step tab in the console.

JAR location: command-runner.jar
Main class: None
Arguments: spark-submit --class --init-keyspaces
Action on failure: Terminate cluster

Expected:

JAR location: s3://mybucket/snapshots/0.1.0-SNAPSHOT/2.11/my-spark-assembly-0.1.0-SNAPSHOT.jar
Main class: com.myspark.data.customer.jobs.MyJob
Arguments: spark-submit --class --init-keyspaces
Action on failure: Terminate cluster

I've confirmed the S3 bucket and JAR are in the correct location. I'm getting the same result when using standard syntax as well.

1

1 Answers

0
votes

Found that my expectation was incorrect. When creating a new job via the CLI and including only JAR args then a Custom JAR project is created. If spark args (i.e. --conf) are also passed in to the CLI then a Spark job is created.

These two job types from the web console look different. For example, the JAR location is set to command-runner.jar for Spark jobs however for a Custom JAR it is set to the path of the s3 bucket.

AWS Custom Spark Step Documentation https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html