0
votes

Below is my code : I am trying to access s3 files from spark locally. But getting error : Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: s3n://bucketname/folder I am also using jars :hadoop-aws-2.7.3.jar,aws-java-sdk-1.7.4.jar,hadoop-auth-2.7.1.jar while submitting spark job from cmd.

package org.test.snow
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.log4j._
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.SparkSession
import org.apache.spark.util.Utils
import org.apache.spark.sql._
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

object SnowS3 {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("IDV4")
val sc = new SparkContext(conf)
val spark = new org.apache.spark.sql.SQLContext(sc)
import spark.implicits._
sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "A*******************A")
sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey","A********************A")
val cus_1=spark.read.format("com.databricks.spark.csv")
.option("header","true")
.option("inferSchema","true")
.load("s3a://tb-us-east/working/customer.csv")
cus_1.show()
    }
}

Any help would be appreciated. FYI: I am using spark 2.1

1
are you able to access s3a://tb-us-east/working/customer.csv using aws cli using the same credentials? - Michael West
@Michael West Actually I am confused. How can I test this in aws cli ?meaning I need I should have EMR to test the spark code ..right? - user8613814
I meant testing the s3 access of your aws credentials. If it is a credentials issue then there may be no code issue. - Michael West
No no I am able to connect to snow flake using that credentials. There is no problem with credentials . - user8613814
I am able to run this code from EMR cluster but not locally .. No idea why :( - user8613814

1 Answers

2
votes

You shouldn't set that fs.s3a.impl option; that's a superstition which seems to persist in spark examples.

Instead uses the S3A connector just by using the s3a:// prefix with

  • consistent versions of hadoop-* jar versions. Yes, hadoop-aws-2.7.3 needs hadoop-common-2.7.3
  • setting the s3a specific authentication options, fs.s3a.access.key and `fs.s3a.secret.key'

If that doesn't work, look at the s3a troubleshooting docs