I am using Apache PIG to reduce data originally stored in CSV format and want to output in Avro. Part of my PIG script calls a java UDF that appends a few fields to the input Tuple and passes the modified Tuple back. I am modifying the output, PIG, schema when doing this using:
Schema outSchema = new Schema(input).getField(1).schema;
Schema recSchema = outSchema.getField(0).schema;
recSchema.add(new FieldSchema("aircrafttype", DataType.CHARARRAY));
Inside the public Schema outputSchema(Schema input)
method of my UDF.
Within the exec
method, I append java.lang.String
values to the input Tuple and return the edited Tuple to the PIG script. This, and all subsequent operations work fine. If I output to CSV format using PigStorage(',')
there are no problems. When I attempt to output using
STORE records INTO '$out_dir' USING org.apache.pig.piggybank.storage.avro.AvroStorage('
{
"schema":{
"type":"record", "name":"my new data",
"fields": [
{"name":"fld1", "type":"long"},
{"name":"fld2", "type":"string"}
]}
}');
I get the following error:
java.io.IOException: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8
I have attempted appending the character fields to the Tuple (within my UDF) as char[]
and Utf8
types, but that makes PIG angry before I even get to trying to write out data. I have also attempted modifying my Avro schema to allow for null types in every field.
I'm using PIG v0.11.1 and Avro v1.7.5, any help is much appreciated.