I am trying to create an external table in hive via hue on AWS EMR
CREATE EXTERNAL TABLE IF NOT EXISTS urls (
id STRING,
`date` TIMESTAMP,
url STRING,
expandedUrl STRING,
domain STRING
)
PARTITIONED BY (`year` INT, `month` INT, `day` INT)
STORED AS PARQUET LOCATION 's3://data/processed/urls/'
- I have created and
EMRcluster (emr-5.4.0) using the AWS console. - Logged into
Hue - Run the above
SQL
In the Metastore Manager I get the following error:
Cannot access: s3://data/processed/urls/. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "hadoop".
[Errno 22] Unknown scheme s3, available schemes: ['hdfs']
I also can't see s3 under the File Manager. I can access s3 from the master node using the CLI tools.
Are there any config options I am missing from the cluster creation? Do I need to give the Hue user additional permissions?
UPDATE
I have tried creating an hdfs user and a new database, as franklinsijo suggested.
I now get the same error on the database:
Cannot access: s3://data/processed.
[Errno 22] Unknown scheme s3, available schemes: ['hdfs']
When the "create database" SQL is run from the hive CLI I get "Access Denied"
I am using EMR_DefaultRole which has both AmazonElasticMapReduceRole and AmazonS3FullAccess
UPDATE 2
I have worked through the issue with help from franklinsijo
- I can create databases and tables on s3 from both the
hive cliand thehue. - I can read and write data from the tables
- I can't see
S3 Browseras detailed in http://gethue.com/introducing-s3-support-in-hue/ - I can't access the table via 'Metastore Manager -> Database -> Table -> STATS -> Location'. I still get [Errno 22]