0
votes

Please help me in clarifying my doubt. I am not sure about the purpose of Partitioning in HIVE. Here is what I am trying to do.Below is my data file:

File: kishore,31 ramesh,32 kishore,33 ramesh,34

I created a Partitioned managed table EMP as shown below:

create table EMP (name string,age int) partitioned by (country string,state string) row format delimited fields terminated by ',';

Now i am loading the data as shown below: load data local inpath '/../../file' into table EMP partition (country = 'US', state = 'Oklahoma');

So now my table with data should be like this : kishore,31,US,Oklahoma ramesh,32,US,Oklahoma kishore,33,US,Oklahoma ramesh,34,US,Oklahoma.

MY QUSETION IS how was partitioning useful here? Even if it was a non-partitioned table having country and state column as well and if i would have given select * form EMP(for non-partioned table) or select * from EMP where country = US and state = Oklahoma(for partitioned table), i get the same result, Its one or the same thing. how the performance is improved?

Thanks!

2

2 Answers

0
votes

Check this link to better understand partitioning in Hive,

http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/

Crux is,

  1. optimized storage of large data ( you have to specify the partition keys )
  2. the above is specified based on querying patterns
  3. static and dynamic partitions options

furter reading --> https://www.safaribooksonline.com/library/view/programming-hive/9781449326944/

0
votes

You can use partitions as usual columns in your where clauses. Hive regard as columns to partitions while printing the output of select statement (column order is important here). However, Hiveserver know which column is partition or which column is not, and translates the query into mapreduce jobs with this knowledge.