0
votes

I have avro file to be loaded into Hive but my file is in binary. What deserializer should be used to get the binary avro to hive?

I don't want binary data in hive but the decoded binary data.

This is how I create my table.

CREATE TABLE kst7 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='pathtoavsc.avsc');

When I use the above command table gets created, data gets loaded but when I do a select * from table I get below error:

Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found bytes, expecting union

avsc file:

{
"namespace": "com.nimesh.tripod.avro.enrichment",
"type": "record",
"name": "EnrichmentData",
"fields": [
    {"name": "rowKey", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
    {"name": "ownerGuid", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
    {"name": "autotagsEnrichment", "type": ["bytes", "null", {
                                                        "namespace": "com.nimesh.tripod.avro.enrichment",
                                                        "type": "record",
                                                        "name": "AutotagEnrichment",
                                                        "fields": [
                                                            {"name": "version", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
                                                            {"name": "autotags", "type": ["null", {"type": "array", "items": {
                                                                                                                                 "namespace": "com.nimesh.tripod.avro.enrichment",
                                                                                                                                 "type": "record",
                                                                                                                                 "name": "Autotag",
                                                                                                                                 "fields": [
                                                                                                                                     {"name": "tag", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
                                                                                                                                     {"name": "score", "type": ["null", "double"], "default": null}
                                                                                                                                 ]
                                                                                                                             }}], "default": null}
                                                        ]
                                                    }], "default": null},
    {"name": "colorEnrichment", "type": ["bytes","null", {
                                                     "namespace": "com.nimesh.tripod.avro.enrichment",
                                                     "type": "record",
                                                     "name": "ColorEnrichment",
                                                     "fields": [
                                                         {"name": "version", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
                                                         {"name": "color", "type": ["null", {"type": "array", "items": {
                                                                                                                           "namespace": "com.nimesh.tripod.avro.enrichment",
                                                                                                                           "type": "record",
                                                                                                                           "name": "Color",
                                                                                                                           "fields": [
                                                                                                                               {"name": "color", "type": ["null", {"type":"string","avro.java.string":"String"}], "default": null},
                                                                                                                               {"name": "score", "type": ["null", "double"], "default": null}
                                                                                                                           ]
                                                                                                                       }}], "default": null}
                                                     ]
                                                 }], "default": null}
]
}
1

1 Answers

0
votes

I think you are looking for SERDEPROPERTIES rather than TBLPROPERTIES

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('avro.schema.url'='pathtoschema.avsc')

Otherwise, try selecting individual fields until you find the one that's causing the error, then inspect what type(s) the AVSC are being mapped into the Hive table as.