I have a scenario, where I am loading 40 files with different patterns from a directory to Hive Tables using HCatStorer.
Directory : opt/inputfolder/
Input Files Pattern :
inp1*.log,
inp2*.log,
.....
inp39*.log,
inp40*.log.
I have written a pig-script which reads all the files with 40 patterns.
But my problem is, these 40 files is mandatory and I may not receive some files. In which case, I am getting an exception stating:
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input Pattern opt/ip_files/inp16*.log matches 0 files
Is there any way to handle this exception?
I want to read the remaining 39 files with pattern even though this file is not present.
What if my source files are in string (i.e. banana_2014012.log,orange_2014012.log,apple_2014012.log)
The following is my Approach for loading data from these files to HIVE Table using HCatStorer.
*** Pseudo code ****
banana_src = LOAD banana_*.log' using PigStorage;
......
Store banana_src into BANANA using HCatStorer;
apple_src = LOAD banana_*.log' using PigStorage;
......
Store apple_src into APPLE using HCatStorer;
orange_src = LOAD banana_*.log' using PigStorage;
......
Store orange_src into ORANGE using HCatStorer;
If any of the src is not having files then this Pig script will throw error saying Match Pattern is 0 and PIG Scrip will be in FAILED.Even though one source file is not available, I want my scrip to load the other tables without failing my Job.
Thanks.