0
votes

I have a Hive table TEST with this configuration:

create external table if not exists TEST ( ID bigint, ACTIVITY_ID string, BATCH_NBR ) PARTITIONED BY (year INT, month INT, day INT) CLUSTERED BY (BATCH_NBR) into 20 buckets ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/lake/hive/test';

And I have Hive files in this location which I can easily load into Hive table and it works.

/user/lake/hive/test/2013/01/01/part-r-00001

Now if I create another table STORE and insert some data from this TEST table, folder structures are getting changes for the Test table. I was expecting after loading the same data, location for the STORE table will have something like this:

/user/core/store/2014/07/03/batch123231.1313

But the above location changed to this:

/user/core/store/year=2013/month=01/day=01/

I'm using insert overwrite table STORE select * from TEST; query for loading STORE table from TEST.

How can I load that table and preserve the same folder structure in destination?

1

1 Answers

0
votes

Internal table in Hive will follow their own/default folder structure in /apps/hive/warehouse folder and will not preserve folder structure if the data is loaded from an external Hive table. I was using internal table for "Store", so it was not working as expected.