Scenario
I create an AWS IAM role called "my-role" specifying EC2 as trusted entity, i.e. using the trust relationship policy document:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
The role has the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetBucketAcl",
"s3:GetBucketCORS",
"s3:GetBucketLocation",
"s3:GetBucketLogging",
"s3:GetBucketNotification",
"s3:GetBucketPolicy",
"s3:GetBucketRequestPayment",
"s3:GetBucketTagging",
"s3:GetBucketVersioning",
"s3:GetBucketWebsite",
"s3:GetLifecycleConfiguration",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:GetObjectTorrent",
"s3:GetObjectVersion",
"s3:GetObjectVersionAcl",
"s3:GetObjectVersionTorrent",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:PutObjectVersionAcl",
"s3:RestoreObject"
],
"Resource": [
"arn:aws:s3:::my-bucket/*"
]
}
]
}
I launch an EC2 instance (Amazon Linux 2014.09.1) from the command line using AWS CLI, specifying "my-role" as instance profile and everything works out fine. I verify that the instance effectively assumes "my-role", by running:
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
to query for instance metadata, from which I get the responsemy-role
;curl http://169.254.169.254/latest/meta-data/iam/security-credentials/my-role
from which I get temporary credentials associated to "my-role".
An example of such credentials retrieval response is something like:
{
"Code" : "Success",
"LastUpdated" : "2015-01-19T10:37:35Z",
"Type" : "AWS-HMAC",
"AccessKeyId" : "an-access-key-id",
"SecretAccessKey" : "a-secret-access-key",
"Token" : "a-token",
"Expiration" : "2015-01-19T16:47:09Z"
}
aws s3 ls s3://my-bucket/
from which I correctly get a list containing the first subdirectory(ies) under "my-bucket". (The AWS CLI comes installed and configured by default when launching this AMI. EC2 instance and S3 bucket are within the same AWS account)
I run/install a Tomcat7 server and container on such instance, on which I deploy a J2EE 1.7 servlet with no issues.
Such servlet should download on the local file system a file from an S3 bucket, in particular from s3://my-bucket/custom-path/file.tar.gz
using Hadoop Java APIs. (Please, note that I tried hadoop-common artifact 2.4.x, 2.5.x, 2.6.x with no positive results. I'm gonna post below the exception I get when using 2.5.x)
Within the servlet, I retrieve fresh credentials from the instance metadata URL above mentioned and use them to configure my Hadoop Java API instance:
...
Path path = new Path("s3n://my-bucket/");
Configuration conf = new Configuration();
conf.set("fs.defaultFS", path.toString());
conf.set("fs.s3n.awsAccessKeyId", myAwsAccessKeyId);
conf.set("fs.s3n.awsSecretAccessKey", myAwsSecretAccessKey);
conf.set("fs.s3n.awsSessionToken", mySessionToken);
...
Obviously, myAwsAccessKeyId
, myAwsSecretAccessKey
, and mySessionToken
are Java variables that I previously set with the actual values.
Then, I effectively get a FileSystem instance, using:
FileSystem fs = path.getFileSystem(conf);
I am able to retrieve all the configuration related to the FileSystem (fs.getconf().get(key-name)) and verify everything is configured as assumed.
Problem
I cannot download s3://my-bucket/custom-path/file.tar.gz
using:
...
fs.copyToLocalFile(false, new Path(path.toString()+"custom-path/file.tar.gz"), outputLocalPath);
...
If I use hadoop-common 2.5.x I get the IOException
:
org.apache.hadoop.security.AccessControlException: Permission denied: s3n://my-bucket/custom-path/file.tar.gz at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:449) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at org.apache.hadoop.fs.s3native.$Proxy12.retrieveMetadata(Unknown Source) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:467) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1968) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1937) ...
If I use hadoop-common 2.4.x, I get a NullPointerException
:
java.lang.NullPointerException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1968) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1937) ...
Just for the records, if DON'T set any aws credential, I get:
AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).
Final notes
- If I try to download the file from the very same URI (but s3 in place of s3n) using AWS CLI commands from the instance, I have NO PROBLEMS AT ALL.
- If I try to download an Hadoop distribution (like 2.4.1 from https://archive.apache.org/dist/hadoop/core/hadoop-2.4.1/), unzip it, retrieve the temporary AWS credentials from the instance metadata URL and try to run
<hadoop-dir>/bin/hadoop fs -cp s3n://<aws-access-key-id>:<aws-secret-access-key>@my-bucket/custom-path/file.tar.gz .
I get, once again, a NPE:
Fatal internal error java.lang.NullPointerException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:479) at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Ls.processPathArgument(Ls.java:96) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at org.apache.hadoop.fs.shell.Command.run(Command.java:154) at org.apache.hadoop.fs.FsShell.run(FsShell.java:255) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:308)
Sorry for the long post, I just tried to be as much detailed as I could. Thanks for any eventual help out here.