Loading data into Partitions in Hive

Question

Please help me in clarifying my doubt. I am not sure about the purpose of Partitioning in HIVE. Here is what I am trying to do.Below is my data file:

File: kishore,31 ramesh,32 kishore,33 ramesh,34

I created a Partitioned managed table EMP as shown below:

create table EMP (name string,age int) partitioned by (country string,state string) row format delimited fields terminated by ',';

Now i am loading the data as shown below: load data local inpath '/../../file' into table EMP partition (country = 'US', state = 'Oklahoma');

So now my table with data should be like this : kishore,31,US,Oklahoma ramesh,32,US,Oklahoma kishore,33,US,Oklahoma ramesh,34,US,Oklahoma.

MY QUSETION IS how was partitioning useful here? Even if it was a non-partitioned table having country and state column as well and if i would have given select * form EMP(for non-partioned table) or select * from EMP where country = US and state = Oklahoma(for partitioned table), i get the same result, Its one or the same thing. how the performance is improved?

Thanks!

viru viru · Accepted Answer · 2015-12-01T00:32:25

Check this link to better understand partitioning in Hive,

http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/

Crux is,

optimized storage of large data ( you have to specify the partition keys )
the above is specified based on querying patterns
static and dynamic partitions options

furter reading --> https://www.safaribooksonline.com/library/view/programming-hive/9781449326944/

Loading data into Partitions in Hive

2 Answers