I am having an issue with Hive external table creation with Avro data / schema:
Steps followed:
- Imported data from MySQL - HDFS as AVRO.
- Transferred the .avsc file from local to HDFS [ Opened the file and the schema is as expected and fine ]
- Verified the Data is present in HDFS as a result of SQOOP import.
- Now created an external table pointing schema to step #2 and data location to step # 3.
- Hive command line states the OK, table created. ShotTables displays the table and verified the file location tagging from hue is all fine.
When Query the Table from HIVE command line, getting an error:
java.io.IOException:java.io.IOException: Not a data file.
hive> create external table departmentsAvro2 row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
location 'hdfs://quickstart.cloudera/user/cloudera/sqoopAvro'
tblproperties ('avro.schema.url'='hdfs://quickstart.cloudera/user/cloudera/departments.avsc');
Output:
OK
Time taken: 0.092 seconds
hive> show tables;
Output:
OK
departmentsavro2
order_items
orders
Time taken: 0.016 seconds, Fetched: 12 row(s)
hive> select * from departmentsavro2;
Output:
OK
Failed with exception java.io.IOException:java.io.IOException: Not a data file.
Time taken: 0.145 seconds
As suggested in some threads provided all necessary RWX permissions to the .avsc / data files in HDFS.
Any pointers?