0
votes

I am using this link to install Spark Cluster on EMR(Elastic Map Reduce on Amazon) https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923

For creating a Spark cluster I run the following command and my cluster is running into bootstrap failure every single time. I am not able to resolve this issue, and it will be great if any could help me here.

aws emr create-cluster --name SparkCluster --ami-version 3.2 \
--instance-type m3.xlarge --instance-count 3 --ec2-attributes \
KeyName=MYKEY --applications Name=Hive --bootstrap-actions \
Path=s3://support.elasticmapreduce/spark/install-spark

SOLVED : Use this:

aws emr create-cluster --name SparkCluster --ami-version 3.7 \
--instance-type m3.xlarge --instance-count 3 --service-role \
EMR_DefaultRole --ec2-attributes \
KeyName=emr,InstanceProfile=EMR_EC2_DefaultRole \
--applications Name=Hive --bootstrap-actions \
Path=s3://support.elasticmapreduce/spark/install-spark 
3

3 Answers

2
votes

Summary of the answer (it took a bit of back and forth in comments) that worked for this user given the user's SSH key and IAM roles:

aws emr create-cluster --name SparkCluster --ami-version 3.7 --instance-type m3.xlarge --instance-count 3 --service-role EMR_DefaultRole --ec2-attributes KeyName=emr,InstanceProfile=EMR_EC2_DefaultRole --applications Name=Hive --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark

Explanations of EMR IAM roles can be found at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-creatingroles.html and http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-launch-jobflow.html

1
votes

The 4th point under the section Spark with YARN on an Amazon EMR cluster at the link you provide says the following:

Substitute "MYKEY" value for the KeyName parameter with the name of the EC2 key pair you want to use to SSH into the master node of your EMR cluster.

As far as I can see, you have not changed the value of MYKEY for your own EC2 key name. You should try changing its value to an existing EC2 key name you have already created.

In case you still do not have a keypair, you can created following several methods, one of which is described in this link.

Update (from the comments below)

From your pictures, it seems there is a problem downloading the bootstrap action file from S3. I am not sure what the cause of the problem could be, but you might want to change the AMI and launch EMR with a different AMI version, 3.0, for example.

-1
votes

There is another way to directly start spark cluster in EMR.

Step 1 - Go to the EMR section in aws and click on create cluster.

Step 2 - Go to bootstrap actions in the configuration and add this line s3://support.elasticmapreduce/spark/install-spark https://www.pinterest.com/pin/429953095652701745/

Step 3 - Click on create cluster

Your cluster will start in minutes :)