I’ve data in the following format:
(Id, Description)
1, xyz is something. Abc bcd & so on.
1, xyz is something. Abc xyz & so on.
2, abc is something. Abc xyz & so on.
I need output in this format:
Id, Word
I tried this:
A = LOAD './data.txt' USING PigStorage(',') as (id: int, desc:chararray);
B = FOREACH A GENERATE id, FLATTEN(STRSPLIT(desc, '[,?:;\s]'));
This results in output such as this:
1, xyz, is, something, Abc, bcd, so, on
What I want is:
1, xyz
1, is
1, something
etc etc..
How can I do this in Pig (without writing a UDF)?
PS: Also tried:
B = FOREACH A GENERATE id, FLATTEN(datafu.pig.util.TransposeTupleToBag(STRSPLIT(desc, '[.&,?:;\s]')));