I need to load the data for a certain partition (date) in Pig. This data was created in Hive, and partitioned on date. So i want to load the data in Pig via HCatalog.
The HCatalog documentation says that to load a certain partition in Pig, you first load the whole dataset and then filter on it, i.e. :
a = load 'web_logs' using org.apache.hcatalog.pig.HCatLoader();
b = filter a by datestamp > '20110924';
https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore But I am afraid this first loads the whole data in bag a, then only filters it in b. Am i correct or no ?
In Hive this works (without HCat), you can prune the data to just get the partition you want, i.e. :
LOAD DATA INPATH 'filepath' INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
What is the equivalent of this construct in Pig with HCatalog ?
Thanks!