
I am new to PIG and I am trying to create a UDF which get a tuple and return multiple tuple based on a delimited. So I have written one UDF to read the below data file

2012/01/01 Name1 Category1|Category2|Category3
2012/01/01 Name2 Category2|Category3
2012/01/01 Name3 Category1|Category5

Basically i am trying to read $2 field


to get the output as :-

Category1, Category2, Category3
Category2, Category3
Category1, Category5

Below is the UDF code i have written..

    package com.test.multipleTuple;    
    import java.io.IOException;
    import org.apache.pig.EvalFunc;
    import org.apache.pig.data.Tuple;
    import org.apache.pig.data.TupleFactory;

    public class TupleToMultipleTuple extends EvalFunc<String> {

        public String exec(Tuple input) throws IOException {

            // Keep the count of every cell in the
            Tuple aux = TupleFactory.getInstance().newTuple();

            if (input == null || input.size() == 0)
                return null;
            try {
                String del = "\\|";
                String str = (String) input.get(0);

                String field[] = str.split(del);
                for (String nxt : field) {
            } catch (Exception e) {
                throw new IOException("Caught exception processing input row ", e);

            return aux.toDelimitedString(",");

created Jar --> TupleToMultipleTuple.jar

But I am getting the below error while executing it .

 Pig Stack Trace
    ERROR 1066: Unable to open iterator for alias B

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B
        at org.apache.pig.PigServer.openIterator(PigServer.java:892)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:547)
        at org.apache.pig.Main.main(Main.java:158)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: java.io.IOException: Job terminated with anomalous status FAILED
        at org.apache.pig.PigServer.openIterator(PigServer.java:884)
        ... 13 more

Can you please help me in rectifying the issue. Thanks.

Pig script for applying the UDF..

REGISTER TupleToMultipleTuple.jar;
DEFINE myFunc com.test.multipleTuple.TupleToMultipleTuple();
A = load 'data.txt' USING PigStorage(' ');
B = foreach A generate myFunc($2);
dump B;
Please add your pig script.54l3d
@54l3d : Actually this time I was looking to fix my UDF. I will really appreciate if you can help me for the same. And the builtin function flatten(STRSPLIT($2,'[|]',3)) worked absolutely fine. I am trying to figure out on why my code is giving me error while executing the UDF. Thanks .Arpan
@54l3d : I have also added pig script with the main problem, on how I am trying to execute my UDF.Arpan

2 Answers


You can use the built-in split function like this:


and you will get 3 tuples named cat1, cat2 and cat2 typed as chararray and delimited by the current delimiter of the relation which they belong to.


Found the issue .. Issue was while parsing DayaByteArray to String.. used toString() to fix it

package com.test.multipleTuple;    
    import java.io.IOException;
    import org.apache.pig.EvalFunc;
    import org.apache.pig.data.Tuple;
    import org.apache.pig.data.TupleFactory;

    public class TupleToMultipleTuple extends EvalFunc<String> {

        public String exec(Tuple input) throws IOException {

            // Keep the count of every cell in the
            Tuple aux = TupleFactory.getInstance().newTuple();

            if (input == null || input.size() == 0)
                return null;
            try {
                String del = "\\|";
                String str = (String) input.get(0).toString();

                String field[] = str.split(del);
                for (String nxt : field) {
            } catch (Exception e) {
                throw new IOException("Caught exception processing input row ", e);

            return aux.toDelimitedString(",");