1
votes

I have a simple tab separated file with a pig schema that I am trying to load and add two columns. When I load using "--schema" option of PigStorage, the addition fails with a ClassCastException. When I load with '--noschema', the addition works fine. Why is Pig failing with the exception in the former case?

Here is the sample file with only 1 line of input with tab separated values:

a       1       1

The schema ".pig_schema" looks like:

{"fields":[{"name":"str","type":55,"description":"autogenerated from Pig Field Schema","schema":null},{"name":"score","type":15,"description":"autogenerated from Pig Field Schema","schema":null},{"name":"count","type":15,"description":"autogenerated from Pig Field Schema","schema":null}],"version":0,"sortKeys":[],"sortKeyOrders":[]}

Here is the list of statements from the grunt shell:

a1 = load '/local/workplace/data' using PigStorage(); --load with schema
describe a1; -- a1: {str: chararray,score: long,count: long}
b1 = foreach a1 generate score + count;
dump b1; -- throws exception
a2 = load '/local/workplace/data' using PigStorage('\t', '--noschema') as (str:chararray, score:long, count: long);
b2 = foreach a2 generate score+count; -- no exception
dump b2; -- works fine

The exception that is thrown is:

org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [Add (Name: Add[long] - scope-34 Operator Key: scope-34) chi
ldren: [[POProject (Name: Project[long][0] - scope-32 Operator Key: scope-32) children: null at []], [POProject (Name: Project[long][1] - scope-33 Op
erator Key: scope-33) children: null at []]] at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Numb
er
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Number
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.genericGetNext(Add.java:100)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNextLong(Add.java:123)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:323)

Pig Version: 0.12.1

1
Yes you are right,its looks like some issue in 0.12. The same commands works fine in 0.13 version.Sivasakthi Jayaraman

1 Answers

0
votes

By default if u dot provide schema everything is considered as byte-array .