
I'm running a Spark streaming app from my local to read from an S3 bucket.

I'm using the Hadoop-AWS jar to set S3 authentication parameters - https://hadoop.apache.org/docs/r3.0.0/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_with_S3

This is the error message 'Forbidden':

org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: #####, AWS Error Code: null, AWS Error Message: Forbidden
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - HTTP Status Code: 403
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - AWS Error Code: null
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Type: Client
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Request ID: #####
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Class Name: com.amazonaws.services.s3.model.AmazonS3Exception

Code to read from bucket:

val sc: SparkContext = createSparkContext(scName)
val hadoopConf=sc.hadoopConfiguration
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
val ssc = new StreamingContext(sc, Seconds(time))
val lines = ssc.textFileStream("s3a://foldername/subfolder/")

I have set the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN variables on my terminal but it still gives me 'Forbidden'.

I am able to access S3 from the terminal though (using the AWS profile) so I'm not sure why it doesn't work when I go through Spark. Any ideas appreciated.

You need to export those variables to all executors, not just your local machineOneCricketeer
Are you located in any embargoed countries?Mobin Ranjbar
@cricket_007 how do I do that? If I set those variables in hadoopConf isn't that enough?covfefe
@MobinRanjbar no I'm notcovfefe
"fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem" only sets the filesystem. You also need to set the keys. See code here. stackoverflow.com/q/49230086/2308683 Also see cloudera.com/documentation/enterprise/latest/topics/… Search for "Specify the credentials at run time"OneCricketeer

In order to obfuscate the keys away from the code in plain-text.

You can add a core-site.xml file to the classpath with the keys


Or if you don't care about putting the keys directly in the code,

sc.hadoopConfiguration.set("fs.s3a.access.key", "...")
sc.hadoopConfiguration.set("fs.s3a.secret.key", "...")

The recommended way is to use a Java jceks credential file