2
votes

I am running Hive from a container (this image: https://hub.docker.com/r/bde2020/hive/) in my local computer.

I am trying to create a Hive table stored as a CSV in S3 with the following command:

CREATE EXTERNAL TABLE local_test (name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 's3://mybucket/local_test/';

However, I am getting the following error:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.io.IOException No FileSystem for scheme: s3)

What is causing it? Do I need to set up something else?

Note: I am able to run aws s3 ls mybucket and also to create Hive tables in another directory, like /tmp/.

1

1 Answers

1
votes

Problem discussed here.

https://github.com/ramhiser/spark-kubernetes/issues/3

You need to add reference to aws sdk jars to hive library path. That way it can recognize file schemes,

s3, s3n, and s3a

Hope it helps.

EDIT1:

hadoop-aws-2.7.4 has implementations on how to interact with those file systems. Verifying the jar it has all the implementations to handle those schema.

org.apache.hadoop.fs tells hadoop to see which file system implementation it need to look.

Below classes are implamented in those jar,

org.apache.hadoop.fs.[s3|s3a|s3native]

The only thing still missing is, the library is not getting added to hive library path. Is there anyway you can verify that path is added to hive library path?

EDIT2:

Reference to library path setting,

How can I access S3/S3n from a local Hadoop 2.6 installation?