How to create hierarchical partitions for batch data in Hive

Question

consider 2000 year data.

test.csv

country_code,product_code,rpt_period
us,crd,2000
us,pcl,2000
us,mtg,2000
in,crd,2000
in,pcl,2000
in,mtg,2000

now i am appending newly generated 2001 records to test.csv. after appending new data to test.csv my data looks like below.

append.csv

country_code,product_code,rpt_period
us,crd,2000
us,pcl,2000
us,mtg,2000
in,crd,2000
in,pcl,2000
in,mtg,2000
us,crd,2001
us,pcl,2001
us,mtg,2001
in,crd,2001
in,pcl,2001
in,mtg,2001

Below scenarios are possible in the hive? If yes, please answer questions.

How to create schema for Partition table Foo using this data?. and also I want partition columns as country_code and product_code.
For instance, i want to load (from test.csv file records) to table Foo? using hive LOAD DATA comand ?
How to load append.csv (only 2001 records) to table Foo. this also needs to be done using hive LOAD DATA command

Thanks.

Sathiyan S Sathiyan S · Accepted Answer · 2017-01-16T08:07:08

Yes, All the scenarios you have mentioned are possible with Hive.

Create temp table and load all the data you have and the you can create partitioned table with 2 columns you have mentioned.

For 2 and 3: Just the load command will work. If your intention is to load into partitioned table you have to go via creating temp table and insert into partitioned table.

Let me know this is what you want else update your question.

How to create hierarchical partitions for batch data in Hive

test.csv

append.csv

1 Answers