I try to save some testing data to S3 from my local laptop using Java and getting following error:
java.io.IOException: No FileSystem for scheme: s3a at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1443) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:209) at org.apache.parquet.hadoop.ParquetWriter.(ParquetWriter.java:266) at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:489)
Below is my code
private void testSaveToS3(SysS3Configuration s3Configuration) {
try {
Schema avroSchema = TestDTO.getClassSchema();
Path path = new Path("s3a://" + s3Configuration.getBucketName()+"/test.parquet");
Configuration config = new Configuration();
config.set("fs.s3a.access.key", s3Configuration.getAccessKeyId());
config.set("fs.s3a.secret.key", s3Configuration.getSecretKey());
ParquetWriter writer = AvroParquetWriter.<GenericData.Record>builder(path)
.withSchema(avroSchema)
.withConf(config)
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withWriteMode(ParquetFileWriter.Mode.OVERWRITE)
.build();
List<TestDTO> list = new ArrayList<>();
TestDTO l1 = new TestDTO();
l1.setId(1);
l1.setValue(11);
TestDTO l2 = new TestDTO();
l2.setId(2);
l2.setValue(22);
list.add(l1);
list.add(l2);
for (TestDTO d : list) {
final GenericRecord record = new GenericData.Record(avroSchema);
record.put("id", d.getId());
record.put("value", d.getValue());
writer.write(record);
}
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
}
I googled around but didn't get an answer. Any thoughts? Thanks in advance.
UPDATE:
- This is a java application and my local laptop doesn't have Hadoop installed.
- I have the following dependencies
compile 'com.amazonaws:aws-java-sdk:1.11.747' compile 'org.apache.parquet:parquet-avro:1.8.1' compile 'org.apache.hadoop:hadoop-aws:3.3.0'
UPDATE: I change the hadoop-aws version to 3.3.0 as suggested, but still get the same error
java.io.IOException: No FileSystem for scheme: s3a
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
......
And then I try to change the "s3a://" in my path string to "s3n://". Now, I get a different error
java.io.IOException: The s3n:// client to Amazon S3 is no longer available: please migrate to the s3a:// client
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2433)
......
Any ideas? guys.