0
votes

I am trying to load the output of pig into a hive table. The data are stored as avro schema on HDFS. In the pig job, I am simply doing:

data = LOAD 'path' using AvroStorage();
data = FILTER BY some property;
STORE data into 'outputpath' using AvroStorage();

I am trying to load it into a hive table by doing:

load data inpath 'outputpath' into table table_with_avro_schema parititon(somepartition);

However, I am getting an error saying that:

FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [somepartition, ], values [])

Can someone please suggests what I am doing wrong here? Thanks a lot!

1

1 Answers

0
votes

I just figured out that it is because LOAD operation does not deserialize the data. It simply acts like a copy operation. Thus, in order to fix it, you should follow these steps:

1. CREATE EXTERNAL TABLE some_table LIKE SOME_TABLE_WITH_SAME_SCHEMA;
2. LOAD DATA INPATH 'SOME_PATH' INTO some_table ;
3. INSERT INTO TARGET_TABLE SELECT * FROM some_table;

Basically, we should first load data into an external table and then insert it into the target hive table.