The solution is actually quite simple.
Firstly, EMR clusters have two roles:
These roles are explained in: Default IAM Roles for Amazon EMR
Therefore, each EC2 instance launched in the cluster is assigned the EMR_EC2_DefaultRole role, which makes temporary credentials available via the Instance Metadata service. (For an explanation of how this works, see: IAM Roles for Amazon EC2.) Amazon EMR nodes use these credentials to access AWS services such as S3, SNS, SQS, CloudWatch and DynamoDB.
Secondly, you will need to add permissions to the Amazon S3 bucket in the other account to permit access via the EMR_EC2_DefaultRole role. This can be done by adding a bucket policy to the S3 bucket (here named other-account-bucket) like this:
{
"Id": "Policy1",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1",
"Action": "s3:*",
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::other-account-bucket",
"arn:aws:s3:::other-account-bucket/*"
],
"Principal": {
"AWS": [
"arn:aws:iam::ACCOUNT-NUMBER:role/EMR_EC2_DefaultRole"
]
}
}
]
}
This policy grants all S3 permissions (s3:*) to the EMR_EC2_DefaultRole role that belongs to the account matching the ACCOUNT-NUMBER in the policy, which should be the account in which the EMR cluster was launched. Be careful when granting such permissions -- you might want to grant permissions only to GetObject rather than granting all S3 permissions.
That's all! The bucket in the other account will now accept requests from the EMR nodes because they are using the EMR_EC2_DefaultRole role.
Disclaimer: I tested the above by creating a bucket in Account-A and assigning permissions (as shown above) to a role in Account-B. An EC2 instance was launched in Account-B with that role. I was able to access the bucket from the EC2 instance via the AWS Command-Line Interface (CLI). I did not test it within EMR, however it should work the same way.
fs.s3n.awsAccessKeyIdonly applies to files accessed vias3n://bucket/file. That might not be how your system is reading from S3. See: Hadoop-AWS module: Integration with Amazon Web Services. - John Rotenstein