Parquet hive table on s3

Question

i am attempting (unsuccessfully to create a parquet hive table on s3).

create external table sequencefile_s3
(user_id bigint, 
creation_dt string
)
stored as sequencefile location 's3a://bucket/sequencefile';

Sequence file works perfectly.

create external table parquet_s3
(user_id bigint,
creation_dt string)
stored as parquet location 's3a://bucket/parquet';

insert into parquet_s3
select * from hdfs_data;

parquet does not work. The files are created on the S3 bucket/folder, select count(*) works, however select * from parquet_s3 limit 10 does not work.

other notes I am running a cloudera distribution 5.8 outside AWS or EC2. the S3a is properly configured (i can copy files though distcp and the s3 sequencefile and textfile external tables work perfectly).

Alper t. Turker Alper t. Turker · Accepted Answer · 2018-10-15T03:48:54

First of all, you are not clear about your problem...
what is the problem?
Also, error logs are very important, what output do you get when you run and what command?
All I can say for now is that Hive has its own SEQUENCEFILE reader and SEQUENCEFILE writer libraries for reading and writing through sequence files.
It uses the SEQUENCEFILE input and output formats from these packages:

org.apache.hadoop.mapred.SequenceFileInputFormat
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

use below table property statement when you are creating your parquet table and try again

tblproperties ("parquet.compress"="SNAPPY");

Parquet hive table on s3

1 Answers