1
votes

Here is UDF code

package myudf;
import java.io.IOException; 
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.pig.EvalFunc; 
import org.apache.pig.data.Tuple; 

public class DateFormat extends EvalFunc<String> {
    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0) {
            return null;
        }

        try {
            String dateStr = (String)input.get(0);
            SimpleDateFormat readFormat = new SimpleDateFormat( "MM/dd/yyyy hh:mm:ss.SSS aa");
            SimpleDateFormat writeFormat = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss.SSS");
            Date date = null;
            try {
                date = readFormat.parse(dateStr);
            } catch (ParseException e) {
                e.printStackTrace();
            }

            return writeFormat.format(date).toString();
        } catch(Exception e) {
            throw new IOException("Caught exception processing input row ", e);
        }
    }
}

Exported a Jar of this and registered in grunt

    Register /local/path/to/UDFDate.jar;
    A = LOAD 'hdfs date file';
    B = FOREACH A GENERATE UDFDate.myudf.DateFormat($0);

Gives Error

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve UDFDate.DateFormat using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

3
what is myudf. Do you have java file myudf in your package UDFDate? - VK_217
what is the first line in DateFormat.java file? - Ronak Patel
Sorry guys..I missed the package name.. - Taha Naqvi

3 Answers

1
votes

you don't need to specify jar name (UDFDate.myudf.DateFormat) to call function in jar. it should be "packageName.className" (myudf.DateFormat).


if DateFormat is in myudf package then you should be running as:

B = FOREACH A GENERATE myudf.DateFormat($0);


if DateFormat is in default package then you should be running as:

B = FOREACH A GENERATE DateFormat($0);

0
votes

call your udf as:

packagename.classname($0);
0
votes

Answer have been given already but in order basically not to re-define UDF call every time you can simplify it:

Register /local/path/to/UDFDate.jar;
DEFINE myDateFormat myudf.DateFormat();
A = LOAD 'hdfs date file';
B = FOREACH A GENERATE myDateFormat($0);