3
votes

I am trying to run a spark step on AWS Data-pipeline. I am getting the following exception:-

amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to complete EMR transform. at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67) at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16) at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136) at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105) at amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:81) at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76) at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53) at java.lang.Thread.run(Thread.java:748) Caused by: amazonaws.datapipeline.taskrunner.TaskExecutionException: EMR job '@DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1' with jobFlowId 'j-2E7PU1OK3GIJI' is failed with status 'FAILED' and reason 'Cluster ready after last step completed.'. Step 'df-0693981356F3KEDFQ6GG_@DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1' is in status 'FAILED' with reason 'null' at amazonaws.datapipeline.cluster.EmrUtil.runSteps(EmrUtil.java:286) at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:63) ... 7 more

The cluster is getting spun up correctly.

Here is the screenshot of the pipeline:-

screenshot

I think there is some issue with the 'step' in activity. Any input would be helpful.

1
This is unsalvageable without a Minimal, Complete, and Verifiable example and the whole error stack.eliasah
I will upload the whole error stack.Meanwhile,1) is there anything faulty in the steps, as in the spark-submit command? As per:-docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/…, we are supposed to use commas right? 2) We can use S3 for inputs right?Sanchay
Have you tried to spin up a cluster and submit a similar Spark step manually? The trace does not expose the cause of the failure: Step 'df-0693981356F3KEDFQ6GG_@DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1' is in status 'FAILED' with reason 'null'. Can you access to the logs on S3?Alexandre Dupriez

1 Answers

2
votes

The issue was that the:- 1) script should have been comma-separated. Something like:-

command-runner.jar,spark-submit,--deploy-mode,cluster,--class,com.amazon.Main

Link:- http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emrcluster.html

2) EmrActivity does not support Staging. So, we cannot use ${INPUT1_STAGING_DIR} in the step instruction. Currently, I have replaced this with the hardcoded S3 URL's.