1
votes

I want to load only specific partitions in PIG, the equivalent of what I want to do in hive:

hive>    
select trim(column) from db.table where partition_code like 'CALL_DETAIL_1905%';

I have tried matches in PIG

data= LOAD 'db.table' USING org.apache.hive.hcatalog.pig.HCatLoader();
onlyPartitions = FILTER data BY (partition_code matches '^CALL_DETAIL_1905.*');

I get the below error:

2016-09-18 01:07:18,098 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.semantic.analyzer.factory.impl does not exist java.io.IOException: MetaException(message:Error parsing partition filter; lexer error: null; exception NoViableAltException(11@[])) at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97) at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:61) at org.apache.hive.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:127) at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.processLoads(LoaderProcessor.java:105) at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.visitTezOp(LoaderProcessor.java:183) at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:249) at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:55) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:472) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:182) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:304) at org.apache.pig.PigServer.launchPlan(PigServer.java:1431) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1416) at org.apache.pig.PigServer.execute(PigServer.java:1405) at org.apache.pig.PigServer.executeBatch(PigServer.java:456) at org.apache.pig.PigServer.executeBatch(PigServer.java:439) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:502) at org.apache.pig.Main.main(Main.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: MetaException(message:Error parsing partition filter; lexer error: null; exception NoViableAltException(11@[])) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result$get_partitions_by_filter_resultStandardScheme.read(ThriftHiveMetastore.java)

The below works and I could populate all the partitions and send them as parameters, however, it would really help if I could use the equivalent of like operator. Please also note that it's only of the regex I have tried to match. I get the same error.

data= LOAD 'db.table' USING org.apache.hive.hcatalog.pig.HCatLoader();
onlyPartitions = FILTER data BY partition_code in ('CALL_DETAIL_19050001500');
1

1 Answers

1
votes

This works for my file. You may use the match criteria below as per your needs. That works similar to LIKE in oracle.

A = LOAD 'san.gdata_impala' using org.apache.hive.hcatalog.pig.HCatLoader();
B = FILTER A BY (UPPER($0) matches '.*(WINTERTON).*');