Say I have a table in Hive named T1. It is partitioned by column dt, which is a date field. In the hive metastore, the directory structure has a folder by the name of the table T1, with subdirectories within - one folder for each date.
My objective is to copy the table's data into Amazon S3, while maintaining the directory structure. If I try to write table contents directly to S3 file as follows, the output is written as a single file and the directory structure is lost:
INSERT OVERWRITE DIRECTORY "s3://<DESTINATION>" SELECT * FROM T1;
Alternatively, if I try to copy the directory from HIVE-metatore directly to s3 using the command, the directory in its entirety is copied to S3, but the underlying files are not comma delimited anymore... it is some unreadable character instead:
s3-dist-cp --src=hdfs://<directory location> --dest=s3://<destination>
Can anyone help me accomplish this? Any suggestions or alternatives?