I'm trying to execute spark-submit using boto3 client for EMR. After executing the code below, EMR step submitted and after few seconds failed. The actual command line from step logs is working if executed manually on EMR master.
Controller log shows hardly readable garbage, looking like several processes writing there concurrently.
UPD: Tried command-runner.jar and EMR versions 4.0.0 and 4.1.0
Any idea appreciated.
The code fragment:
class ProblemExample:
def run(self):
session = boto3.Session(profile_name='emr-profile')
client = session.client('emr')
response = client.add_job_flow_steps(
JobFlowId=cluster_id,
Steps=[
{
'Name': 'string',
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 's3n://elasticmapreduce/libs/script-runner/script-runner.jar',
'Args': [
'/usr/bin/spark-submit',
'--verbose',
'--class',
'my.spark.job',
'--jars', '<dependencies>',
'<my spark job>.jar'
]
}
},
]
)