11
votes

After creating the Amazon S3 Bucket, my_bucket, I created an Elastic Map Reduce cluster via the cli:

aws emr create-cluster --name "Hive testing" --ami-version 3.3 --applications Name=Hive --use-default-roles --instance-type m3.xlarge --instance-count 3 --steps Type=Hive,Name="Hive Program",Args=[-d,INPUT=s3://my_bucket/input,-d.OUTPUT=s3://my_bucket/input,-d-LIBS=s3://my_bucket/serde_libs]

Note that I did not specify a hive *.q file. After making the S3 and EMR Cluster, I will log onto the EMR box, and then run hive interactively.

Note- I'm assuming there's an EMR box onto which I can log.

However, when I ran aws emr describe-cluster --cluster-id XYZ, I saw this error in the output:

   "State": "TERMINATED_WITH_ERRORS", 
        "StateChangeReason": {
            "Message": "EMR service role arn:aws:iam::xyz:role/EMR_DefaultRole 
                         is invalid", 
            "Code": "VALIDATION_ERROR"
        }

What would cause this error? Do I need to open permissions on the S3 bucket for the EMR cluster to access it?

2
did you ever figure this out?C8H10N4O2
Your Policy is not working. Could be from IAM (old user, etc.) See if you can even create a simple cluster. Set up a cluster with the updated UI on the AWS EMR create cluster page and once you got your cluster in Waiting status, export the aws emr options from the CLI export tool.HoofarLotusX

2 Answers

20
votes

The issue is not with the bucket but that the expected IAM role is missing.

See http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-creatingroles.html#emr-iam-roles-createdefaultwithcli

Issue the AWS CLI command:

aws emr create-default-roles 

Then create the cluster again. This is a one-time step needed to create the default roles.

  • note: beware of using a recent version of aws cli, I had problems with 1.4 (debian jessie package)

  • note 2: taken from mrjob code and amazon annoucments:

    instance profile and service role are required for accounts created after April 6, 2015, and will eventually be required for all accounts

0
votes

I've seen this issue crop up when you create custom service roles and assign the wrong principal service.

This example will generate that error:

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Action": "sts:AssumeRole",
       "Principal": {
         "Service": "ec2.amazonaws.com"
       },
       "Effect": "Allow",
       "Sid": "Invalid"
     }
   ]
}

This example will not:

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Action": "sts:AssumeRole",
       "Principal": {
         "Service": "elasticmapreduce.amazonaws.com"
       },
       "Effect": "Allow",
       "Sid": "Valid"
     }
   ]
}

For more info see here: http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-mgmt.pdf#emr-plan-access-iam