6
votes

I would like to be able to use the ~/.aws/credentials file I maintain with different profiles with my spark scala application if that is possible. I know how to set hadoop configurations for s3a inside my app but I don't want to keep using different keys hardcoded and would rather just use my credentials file as I do with different programs. I've also experimented with using java api such as val credentials = new DefaultAWSCredentialsProviderChain().getCredentials() and then creating an s3 client but that doesn't allow me to use my keys when reading files from s3. I also know that keys can go in core-site.xml when I run my app but how can I manage different keys and also how can I set it up with IntelliJ so that I can have different keys pulled in using different profiles?

2
I don't have time to look up links, but you can specify what file to use for credentials rather than using the default provider chain. - childofsoong
Thanks. Can you provide a little more detail for what you mean? Do you mean still using AmazonS3Client client = new AmazonS3Client(new ProfileCredentialsProvider("my_profile_name"))? Not sure how this would apply to reading s3 files with spark using s3a:// - horatio1701d

2 Answers

3
votes

DefaultAWSCredentialsProviderChain contains no providers by default. You need to add some, e.g.:

val awsCredentials = new AWSCredentialsProviderChain(new 
  auth.EnvironmentVariableCredentialsProvider(), new 
  auth.profile.ProfileCredentialsProvider(), new 
  auth.AWSCredentialsProvider())

You can use them with S3 client or, as you mention Spark:

hadoopConfig.set("fs.s3a.access.key", awsCredentials.getAWSAccessKeyId)
hadoopConfig.set("fs.s3a.secret.key", awsCredentials.getAWSSecretKey)

To switch between different AWS profiles you could then switch between profiles by setting the AWS_PROFILE environment variable. Happy to expand on any particular point if needed.

0
votes

If you have the AWS_ env vars set, spark-submit will copy them over as the s3a secrets.

If you want to set a provider chain for S3A, then you can provide a list of provider classes in the option fs.s3a.aws.credentials.provider, These will get created with a Configuration instance if present, otherwise the empty constructor is used. The default list is: one to get secrets from the URI or config, one for env vars, and finally one for EC2 IAM secrets. You can change them to existing ones (anonymous provider, session provider), or write your own...anything which implements com.amazonaws.auth.AWSCredentialsProvider is allowed.

You should be able to set fs.s3a.aws.credentials.provider to com.amazonaws.auth.profile.ProfileCredentialsProvider and have it picked up locally (maybe you'll need your own wrapper which extracts the profile name from the configuration passed in. This will work on any host which has your credentials...it won't work if you only have local secrets and want to submit work elsewhere. It's probably simplest to set environment variables and have spark-submit propagate them.