0
votes

sup y'all

in python, this executes no problem:

sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", "...") sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "...") sc.textFile("s3a://path").count()

someBigNumber

in scala, i get a 403:

sc.hadoopConfiguration.set("fs.s3a.access.key", "...") sc.hadoopConfiguration.set("fs.s3a.secret.key", "...") sc.textFile("s3a://path").count()

StackTrace: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: ...)

why?

this is all spark 2.0.

thanks

2

2 Answers

2
votes
  1. Try to set the properties before you create the SC, e.g set sparkConf "spark.hadoop.fs.s3a..." = value
  2. Spark tries to be clever in spark submit and copy in the AWS_ env vars into the s3a and s3n properties prior to submission, even if the properties are set. This can stamp on your settings. Look at them, verify their correctness, maybe unset them (or try setting them).
  3. And S3a goes through the auth sequence of: attempt hadoop props; attempt env vars in destination process; attempt EC2 IAM role (exact checks & ordering is Hadoop JAR dependent). It may be something at the far end is causing fun here.
  4. There's another emergency option, which is pretty insecure, of putting the username:pass in the url, such as s3a://AAID43ss:1356@bucket/path. This doesn't work on Hadoop < 2.8 if there is a / in the secret, and your secrets get logged to the console. Use carefully. Update this was cut from Hadoop 3.2 after many years of warning the users to stop it.

Trying to debug auth problems is a PITA as the code deliberately avoids having useful debug statements: we don't dare log the properties.

You may find something useful in the Troubleshooting S3A section of the Hadoop docs. Do bear in mind that this covers later versions of Hadoop; some things mentioned there won't be valid.

Enjoy

Steve L (currently working on the S3A code)

0
votes

It means in this case, Python and Scala are "incompatible" and Scala doesn't have access to the amazonaws. Maybe the key is different and you have a typo on the Scala code, or maybe Scala doesn't work with amazonaws anymore due to amazonaws changing.