I have a simple tab separated file with a pig schema that I am trying to load and add two columns. When I load using "--schema" option of PigStorage, the addition fails with a ClassCastException. When I load with '--noschema', the addition works fine. Why is Pig failing with the exception in the former case?
Here is the sample file with only 1 line of input with tab separated values:
a 1 1
The schema ".pig_schema" looks like:
{"fields":[{"name":"str","type":55,"description":"autogenerated from Pig Field Schema","schema":null},{"name":"score","type":15,"description":"autogenerated from Pig Field Schema","schema":null},{"name":"count","type":15,"description":"autogenerated from Pig Field Schema","schema":null}],"version":0,"sortKeys":[],"sortKeyOrders":[]}
Here is the list of statements from the grunt shell:
a1 = load '/local/workplace/data' using PigStorage(); --load with schema
describe a1; -- a1: {str: chararray,score: long,count: long}
b1 = foreach a1 generate score + count;
dump b1; -- throws exception
a2 = load '/local/workplace/data' using PigStorage('\t', '--noschema') as (str:chararray, score:long, count: long);
b2 = foreach a2 generate score+count; -- no exception
dump b2; -- works fine
The exception that is thrown is:
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [Add (Name: Add[long] - scope-34 Operator Key: scope-34) chi ldren: [[POProject (Name: Project[long][0] - scope-32 Operator Key: scope-32) children: null at []], [POProject (Name: Project[long][1] - scope-33 Op erator Key: scope-33) children: null at []]] at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Numb er at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Number at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.genericGetNext(Add.java:100) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNextLong(Add.java:123) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:323)
Pig Version: 0.12.1