1
votes

Say I have a table in Hive named T1. It is partitioned by column dt, which is a date field. In the hive metastore, the directory structure has a folder by the name of the table T1, with subdirectories within - one folder for each date.

My objective is to copy the table's data into Amazon S3, while maintaining the directory structure. If I try to write table contents directly to S3 file as follows, the output is written as a single file and the directory structure is lost:

INSERT OVERWRITE DIRECTORY "s3://<DESTINATION>" SELECT * FROM T1;

Alternatively, if I try to copy the directory from HIVE-metatore directly to s3 using the command, the directory in its entirety is copied to S3, but the underlying files are not comma delimited anymore... it is some unreadable character instead:

s3-dist-cp --src=hdfs://<directory location> --dest=s3://<destination>

Can anyone help me accomplish this? Any suggestions or alternatives?

1
Another option is to directly copy the files into S3 (not via Hive). Then create an External Table pointing to it. - John Rotenstein

1 Answers

2
votes

Possible solution is to create table with the same schema and set location to the desired location, then load data using Hive and dynamic partitioning:

create table T2 like T1;

Alter table T2 set location = 'your destination location';

set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nonstrict;

Insert overwrite table T2 partition (dt)
select * from T1
distribute by dt;