I'm running a Spark streaming app from my local to read from an S3 bucket.
I'm using the Hadoop-AWS jar to set S3 authentication parameters - https://hadoop.apache.org/docs/r3.0.0/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_with_S3
This is the error message 'Forbidden':
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: #####, AWS Error Code: null, AWS Error Message: Forbidden
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - HTTP Status Code: 403
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - AWS Error Code: null
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Type: Client
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Request ID: #####
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Class Name: com.amazonaws.services.s3.model.AmazonS3Exception
Code to read from bucket:
val sc: SparkContext = createSparkContext(scName)
val hadoopConf=sc.hadoopConfiguration
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
val ssc = new StreamingContext(sc, Seconds(time))
val lines = ssc.textFileStream("s3a://foldername/subfolder/")
lines.print()
I have set the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN
variables on my terminal but it still gives me 'Forbidden'.
I am able to access S3 from the terminal though (using the AWS profile) so I'm not sure why it doesn't work when I go through Spark. Any ideas appreciated.
"fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"
only sets the filesystem. You also need to set the keys. See code here. stackoverflow.com/q/49230086/2308683 Also see cloudera.com/documentation/enterprise/latest/topics/… Search for "Specify the credentials at run time" – OneCricketeer