1
votes

I am new to Pig scripting.

I want to pass multiple parameters to Pig filter UDF but I am getting error "Invalid scalar projection : A column needs to be projected from a relation for it to be used as a scalar"

I am doing following steps.

    input = load '....';
    dump input; /* working able to see data*/
    output = FILTER input by not FilterUDF(input,val1,val2);

This didn't worked. So I tried following.

    input = load '......';
    dump input; /* working able to see data*/
    dataWithVal = FOREACH input GENERATE $0,$1,val1,val2;
    dump dataWithVal; /* working able to see data with values*/
    output = FILTER dataWithVal by not FilterUDF(dataWithVal);

This also didn't worked. So I added my values in a file, copied that file in HDFS and then cross joined it with input data but still got same error.

    input = load '........';
    dump input; /* working able to see data*/
    val = load '........';
    dump val; /* working able to values*/
    interData = cross input, val;
    dump interData; /* working able to see cross joined data*/
    output = FILTER interData by not FilterUDF(interData);

For all the above options, I am getting same error as "Invalid scalar projection : A column needs to be projected from a relation for it to be used as a scalar."

For first case, my FilterUDF structure is as follows.

    import org.apache.pig.FilterFunc;
    import java.io.IOException;
    import org.apache.pig.data.Tuple;


    public class FilterUDF extends FilterFunc {
        public boolean exec(Tuple input, int val, String Val) throws IOException {
         /*some code here*/
        }
    }

Case one alternative tried but not worked.

    import org.apache.pig.FilterFunc;
    import java.io.IOException;
    import org.apache.pig.data.Tuple;

    public class FilterUDF extends FilterFunc {

        private Tuple input;
        private int Ival;
        private String Sval;

        public FilterUDF(Tuple input, int Ival, String Sval){
            this.input = input;
            this.Ival = Ival;
            this.Sval = Sval;
        }

        public Boolean exec(Tuple arg0) throws IOException {
        /*Some code*/   
        }
    }

For case two and three, my FilterUDF structure is as follows.

    import org.apache.pig.FilterFunc;
    import java.io.IOException;
    import org.apache.pig.data.Tuple;


    public class FilterUDF extends FilterFunc {

        public Boolean exec(Tuple input) throws IOException {
        /*some code here*/
        }
    }

What I am doing wrong? How to pass multiple parameters to Pig UDF? What is the reason behind the "Invalid scalar projection" error?

Thanks in Advance for your help.

1

1 Answers

0
votes

I'm not exactly sure what you are trying to compute with your UDF because your problem description is kind of vague but in all three of your code examples, you are trying to pass a relation to your UDF, which doesn't really make sense (input, dataWithVal, and interData are relations). You need to pass it values. So say you were using your UDF to assert that val1 and val2 were the same (or whatever), then you could write

input = LOAD '...';  /* load some data */
output = FILTER input BY FilterUDF(val1, val2);

and your UDF would look something like

import org.apache.pig.FilterFunc;
import java.io.IOException;
import org.apache.pig.data.Tuple;


public class FilterUDF extends FilterFunc {
    public Boolean exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;

        int val1 = input.get(0)  // gets val1 from pig
        int val2 = input.get(1)  // gets val2 from pig

    /*rest of code*/
    }
}

As you can see, you can pass as many parameters to your UDF as you want; that's what org.apache.pig.data.Tuple is for; just pass as many parameters as needed and then parse them inside the UDF with .get(i)