0
votes

I have a record in this format:

{(Larry Page),23,M}
{(Suman Dey),22,M}
{(Palani Pratap),25,M}

I am trying to LOAD the record using this:

records = LOAD '~/Documents/PigBag.txt' AS (details:BAG{name:tuple(fullname:chararray),age:int,gender:chararray});

But I am getting this error:

2015-02-04 20:09:41,556 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 7, column 101>  mismatched input ',' expecting RIGHT_CURLY

Please advice.

2

2 Answers

1
votes

It's not a bag since it's not made up of tuples. Try

load ... as (name:tuple(fullname:chararray), age:int, gender:chararray)

For some reason Pig wraps the output of a line in curly braces which make it look like a bag but it's not. If you have saved this data using PigStorage you can save it using a parameter ('-schema') which tells PigStorage to create a schema file .pigschema (or something similar) which you can look at to see what the saved schema is. It can also be used when loading with PigStorage to save you the AS clause.

1
votes

Yes LiMuBei point is absolutely right. Your input is not in the right format. Pig will always expect the bag should hold collection of tuples but in your case its a collection of (tuple and fields). In this case pig will retain the tuple and reject the fields(age and gender) during load.

But this problem can be easily solvable in different approach(kind of hacky solution).
1. Load each input line as chararray.
2. Remove the curly brackets and function brackets from the input.
3. Using strsplit function segregate the input as (name,age,sex) fields.

PigScript:

A = LOAD 'input' USING PigStorage AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(REPLACE(line,'[}{)(]+','')) AS (newline:chararray);
C = FOREACH B GENERATE FLATTEN(STRSPLIT(newline,',',3)) AS (fullname:chararray,age:int,sex:chararray);
DUMP C;

Output:

(Larry Page,23,M)
(Suman Dey,22,M)
(Palani Pratap,25,M)

Now you can access all the fields using fullname,age,sex.