2
votes

It's very simple demo that can reproduce the problem at 0.11.

===testSchemaDATA===

1_a
2_b
3_c

the first script:

a = load 'testSchemaDATA' as (str:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as num;
a2 = foreach a1 generate (int)num as num;
dump a2;

it is right script and dump he answer:

1 2 3

The second and wrong script is (The only difference of two scripts is schema declaration of a1 statement.):

a = load 'testSchemaDATA' as (str:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num,char);
a2 = foreach a1 generate (int)num as num;
dump a2;

it report ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1052: Cannot cast bytearray to int

I don't know how to explain this. is it a bug?

1

1 Answers

0
votes

This will work:

a = load 'testSchemaDATA' as (str:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int,char:chararray);
a2 = foreach a1 generate num as num;
dump a2;

will give you output:

(1)
(2)
(3)

And

a = load 'testSchemaDATA' as (str:chararray);
a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num:int,char:chararray);
a2 = foreach a1 generate char as char;
dump a2;

will give your output:

(a)
(b)
(c)

The difference is, in this case you are explicitly casting the result of STRSPLIT as int and chararray. If not given, it will default to bytearray.

If you do a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as num; Then describe a1 gives

a1: {num: bytearray}

if you do a1 = foreach a generate flatten(STRSPLIT(str,'_',2)) as (num,char); then describe a1 gives:

a1: {num: NULL,char: NULL}

It looks like type is coming as null in this case. I am not sure why this is so. If anyone can tell, would be great.