I am trying to create a aws datapipeline task which will create an EMR cluster and run a simple wordcount.py spark program. I used the datapipeline definition where steps is simple as:
"myEmrStep": "s3://test/wordcount.py,s3://test/data/abc.txt,s3://test/output/outfile5/",
Now, when I activate the task, I get an error like:
Exception in thread "main" java.io.IOException: Error opening job jar: /mnt/var/lib/hadoop/steps/s-187JR8H3XT8N7/wordcount.py at org.apache.hadoop.util.RunJar.run(RunJar.java:160) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.(ZipFile.java:215) at
Seems like the steps is trying to run the program using java instead of python. Any idea, please.
Thanks.