0
votes

I'm managed to setup Hadoop with 3 datanodes as a small cluster and everything work ok. When trying to access AWS bucket on S3A protocol I get this error:

hadoop fs -ls s3a://my-bucket/

-ls: Fatal internal error
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.s3a.S3AFileSystem not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2395)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3208)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3240)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3291)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3259)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:470)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
    at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
    at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:245)
    at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:228)
    at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103)
    at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:317)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:380)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2299)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2393)
    ... 16 more

What I did wrong ? How do fix that ?

P.S. Bucket on Amazon if fully public. Anyone can download from it.

Amazon credentials was configured in hadoop/core-site.xml as described here: Hadoop-AWS module: Integration with Amazon Web Services

1
As per the link you shared the issue seems to be related JAR file missing from CLASSPATH. Can you check if it is accessible. If it is not can you copy required JARS as shown below matching your hadoop version and retry.Also try replacing s3a with s3 and see if it's helping. sudo cp hadoop/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar hadoop/share/hadoop/common/lib/ sudo cp hadoop/share/hadoop/tools/lib/hadoop-aws-2.7.5.jar hadoop/share/hadoop/common/lib/ - Prabhakar Reddy
s3:/my-bucket -> No FileSystem for scheme "s3", s3n://my-bucket -> No FileSystem for "s3n". After copying the jars - no changes :( same exception - Jasper
How I can add folder /usr/local/hadoop/share/hadoop/tools/lib/ to CLASSPATH of hadoop ? - Jasper
I would recommend using EMR instead. Must easier to get a cluster running - OneCricketeer
@bdcloud - could you post an answer ? After I copy the JAR files to correct location it star to work. You are right - Jasper

1 Answers

-1
votes

As per the link you shared the issue seems to be related JAR file missing from CLASSPATH. Can you check if it is accessible. If it is not can you copy required JARS as shown below matching your Hadoop version and retry.

sudo cp hadoop/share/hadoop/tools/lib/$AWS_JAVA_SDK_VERSION.jar hadoop/share/hadoop/common/lib/

sudo cp hadoop/share/hadoop/tools/lib/$AWS_HADOOP_VERSION.jar hadoop/share/hadoop/common/lib/