0
votes

I'm using pig 0.12, and the document here says it supports datetime datatype

http://pig.apache.org/docs/r0.12.0/basic.html#data-types

But the following LOAD statement gives me an UnsupportedOperationException on the first field. The hdfs location contains tab separated files with the first field in this format YYYY-mm-DD.

rsa = LOAD '/mypath/*' USING PigStorage() as (
hit_date:datetime,
agency_id:long,
agency_name:chararray,
....
);

ERROR 2999: Unexpected internal error. null

java.lang.UnsupportedOperationException at parquet.pig.PigSchemaConverter.convertWithName(PigSchemaConverter.java:273) at parquet.pig.PigSchemaConverter.convert(PigSchemaConverter.java:248) at parquet.pig.PigSchemaConverter.convert(PigSchemaConverter.java:285) at parquet.pig.PigSchemaConverter.convertTypes(PigSchemaConverter.java:241) at parquet.pig.PigSchemaConverter.convert(PigSchemaConverter.java:234) at parquet.pig.TupleWriteSupport.(TupleWriteSupport.java:63) at parquet.pig.ParquetStorer.getOutputFormat(ParquetStorer.java:103) at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:80)

1
I'm wondering what the ParquetStorer does in there. If you are really just reading text data with PigStorage that should not be used. Are you sure it is the datetime that is causing the issue? - LiMuBei

1 Answers

1
votes

Check the notes section below the datatypes in the document link you shared.It says -

There is no native constant type for datetime field. You can use a ToDate udf with chararray constant as argument to generate a datetime value.

rsa = load '/mypath/*' as (
    inDateChar:chararray,
    agency_id:long,
    agency_name:chararray,
    ....
    );
convertDate = foreach rsa generate ToDate(inDateChar, 'yyyy-MM-dd') as (inDateDT:datetime);

ToDate uses SimpleDateFormat.[http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html]