unable to read a CSV file present in AWS S3 folder locally in intelij using spark scala

Question

I am trying to read a csv (native) file from an S3 bucket using a locally running Spark - Scala. I am able to read the file using the http protocol but I intend to use the s3a protocol.
Below is the configuration setup before the call

spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "Mykey") spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "Mysecretkey") spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider"); spark.sparkContext.hadoopConfiguration.set("com.amazonaws.services.s3.enableV4", "true") spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "eu-west-1.amazonaws.com") spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl.disable.cache", "true")

I am getting bellow exception:

 1. Exception in thread "main" java.lang.RuntimeException:
    java.lang.ClassNotFoundException: Class
    org.apache.hadoop.fs.s3a.S3AFileSystem not found    at
    org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2154)
        at
    org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2580)

my spark version is: 2.3.1
scala version: 2.11
aws-java-sdk vesrion : 1.11.336
hadoop-aws :2.8.4

BQiao BQiao · Accepted Answer · 2019-03-04T11:23:38

It's the S3 sdk lib missing exception, more detail could be found in https://community.hortonworks.com/articles/25523/hdp-240-and-spark-160-connecting-to-aws-s3-buckets.html

Basic when you see a ClassNotFound Exception, it's caused by some binary file missing in your JVM class path, either the root classloader will load them from the java runtime directory and your application present directory, or an external classloader load it from some given path, check them carefully. May be you need to read more doc about ClassLoader, google it :)

unable to read a CSV file present in AWS S3 folder locally in intelij using spark scala

2 Answers