7
votes

I am trying to use boto3 to launch an EMR cluster like this:

client = boto3.client('emr')
client.run_job_flow(**kwargs)

I'm getting this error:

ClientError: An error occurred (ValidationException) when calling the
RunJobFlow operation: InstanceProfile is required for creating cluster.

(This is boto3 version 1.4.2 on Python 3.5.)

There is no mention of an InstanceProfile attribute at http://boto3.readthedocs.io/en/latest/reference/services/emr.html?highlight=emr#EMR.Client.run_job_flow .

I have tried the argument from my (working) aws cli script:

    --ec2-attributes '{"KeyName":"MyKeyPair",
                    "InstanceProfile":"EMR_EC2_DefaultRole",
                    "AvailabilityZone":"us-east-1c",
                    "EmrManagedSlaveSecurityGroup":"sg-7c753416",
                    "EmrManagedMasterSecurityGroup":"sg-7e753414"}'

.... adding the arg at various places in the kwargs, but no luck.

Can anyone give me a hint or show a working example?

Any help appreciated.

1

1 Answers

8
votes

Tried one more thing, and it worked!

JobFlowRole='EMR_EC2_DefaultRole'

This is an argument to client.run_job_flow(), or a top-level key in **kwargs.

And here's the example that showed me how to do it:

http://tech.adroll.com/blog/spark/2016/01/25/spark-on-emr.html

HTH somebody else.