Unable to load data in Hive partitioned table

Question

I have created a table in Hive with the following query:

create table if not exists employee(CASE_NUMBER String,
                                         CASE_STATUS String,
                                         CASE_RECEIVED_DATE DATE,
                                         DECISION_DATE  DATE,
                                         EMPLOYER_NAME STRING,
                                         PREVAILING_WAGE_PER_YEAR BIGINT,
                                         PAID_WAGE_PER_YEAR BIGINT,
                                         order_n int) partitioned by (JOB_TITLE_SUBGROUP STRING) row format delimited fields terminated by ',';

I tried loading data into the create table using below query:

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee  partition (JOB_TITLE_SUBGROUP);

For the partitioned table, I have even set following configuration :

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

But I am getting below error while executing the load query:

Your query has the following error(s):

Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [job_title_subgroup, ], values [])

Please help.

sure please, its: I-200-14073-248840,denied,3/14/14,3/21/14,"ADVANCED TECHNOLOGY GROUP USA, INC.",NA,62171,1,software engineer — Mohit Sudhera

Jaime Caffarel Jaime Caffarel · Accepted Answer · 2016-08-07T17:27:48

If you want to load data into a Hive partition, you have to provide the value of the partition itself in the LOAD DATA query. So in this case, your query would be something like this.

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP="Value");

Where "Value" is the name of the partition in which you are loading your data. The reason is because Hive will use "Value" to create the directory in which your .csv is going to be stored, which will be something like this: .../employee/JOB_TITLE_SUBGROUP=Value. I hope this helps.

Check the documentation for details on the LOAD DATA syntax.

EDITED

Since the table has dynamic partition, one solution would be loading the .csv into an external table (e.g. employee_external) and then execute an INSERT command like this:

INSERT OVERWRITE INTO TABLE employee PARTITION(JOB_TITLE_SUBGROUP)
SELECT CASE_NUMBER, CASE_STATUS, (...), JOB_TITLE_SUBGROUP
FROM employee_external

Unable to load data in Hive partitioned table

2 Answers