4
votes

I am trying to create an external table in hive via hue on AWS EMR

CREATE EXTERNAL TABLE IF NOT EXISTS urls (
  id STRING,
  `date` TIMESTAMP,
  url STRING,
  expandedUrl STRING,
  domain STRING
) 
PARTITIONED BY (`year` INT, `month` INT, `day` INT)
STORED AS PARQUET LOCATION 's3://data/processed/urls/'
  • I have created and EMR cluster (emr-5.4.0) using the AWS console.
  • Logged into Hue
  • Run the above SQL

In the Metastore Manager I get the following error:

Cannot access: s3://data/processed/urls/. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "hadoop".

[Errno 22] Unknown scheme s3, available schemes: ['hdfs']

I also can't see s3 under the File Manager. I can access s3 from the master node using the CLI tools.

Are there any config options I am missing from the cluster creation? Do I need to give the Hue user additional permissions?

UPDATE

I have tried creating an hdfs user and a new database, as franklinsijo suggested.

I now get the same error on the database:

Cannot access: s3://data/processed.

[Errno 22] Unknown scheme s3, available schemes: ['hdfs']

When the "create database" SQL is run from the hive CLI I get "Access Denied" I am using EMR_DefaultRole which has both AmazonElasticMapReduceRole and AmazonS3FullAccess

UPDATE 2

I have worked through the issue with help from franklinsijo

  • I can create databases and tables on s3 from both the hive cli and the hue.
  • I can read and write data from the tables
  • I can't see S3 Browser as detailed in http://gethue.com/introducing-s3-support-in-hue/
  • I can't access the table via 'Metastore Manager -> Database -> Table -> STATS -> Location'. I still get [Errno 22]
1

1 Answers

2
votes

[Errno 22] Unknown scheme s3, available schemes: ['hdfs']

This is because Hive's default database location is set as HDFS (Refer here). Create a new Hive database with S3 location.

CREATE DATABASE database_name LOCATION 'S3://Bucket/Key';

Then create tables in this newly created database.

Cannot access: s3://data/processed/urls/. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "hadoop".

Create a new Hue user named hdfs with Superuser status in Hue UI. Login as hdfs user to execute queries.