6
votes

I have created a table in Hive with the following query:

create table if not exists employee(CASE_NUMBER String,
                                         CASE_STATUS String,
                                         CASE_RECEIVED_DATE DATE,
                                         DECISION_DATE  DATE,
                                         EMPLOYER_NAME STRING,
                                         PREVAILING_WAGE_PER_YEAR BIGINT,
                                         PAID_WAGE_PER_YEAR BIGINT,
                                         order_n int) partitioned by (JOB_TITLE_SUBGROUP STRING) row format delimited fields terminated by ',';
                                         

I tried loading data into the create table using below query:

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee  partition (JOB_TITLE_SUBGROUP);

For the partitioned table, I have even set following configuration :

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

But I am getting below error while executing the load query:

Your query has the following error(s):

Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [job_title_subgroup, ], values [])

Please help.

2
could you provide the first few lines of your CSV file ?cheseaux
sure please, its: I-200-14073-248840,denied,3/14/14,3/21/14,"ADVANCED TECHNOLOGY GROUP USA, INC.",NA,62171,1,software engineerMohit Sudhera

2 Answers

3
votes

If you want to load data into a Hive partition, you have to provide the value of the partition itself in the LOAD DATA query. So in this case, your query would be something like this.

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP="Value");

Where "Value" is the name of the partition in which you are loading your data. The reason is because Hive will use "Value" to create the directory in which your .csv is going to be stored, which will be something like this: .../employee/JOB_TITLE_SUBGROUP=Value. I hope this helps.

Check the documentation for details on the LOAD DATA syntax.

EDITED

Since the table has dynamic partition, one solution would be loading the .csv into an external table (e.g. employee_external) and then execute an INSERT command like this:

INSERT OVERWRITE INTO TABLE employee PARTITION(JOB_TITLE_SUBGROUP)
SELECT CASE_NUMBER, CASE_STATUS, (...), JOB_TITLE_SUBGROUP
FROM employee_external
2
votes

I might be little late to reply but can try below steps:

  1. Set below properties first :

    Ø set hive.exec.dynamic.partition.mode=nonstrict;
    Ø set hive.exec.dynamic.partition=true;
    
  2. Create temp table first:

    CREATE EXTERNAL TABLE IF NOT EXISTS employee_temp(
    ID STRING,
    Name STRING,
    Salary STRING)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    tblproperties ("skip.header.line.count"="1");
    
  3. Load Data in temporary table:

    hive> LOAD DATA INPATH 'filepath/employee.csv' OVERWRITE INTO TABLE employee; 
    
  4. Create Partitioned Table:

    CREATE EXTERNAL TABLE IF NOT EXISTS employee_part(
    ID STRING,
    Name STRING)
    PARTITIONED BY (Salary STRING)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    tblproperties ("skip.header.line.count"="1");
    
  5. Load Data into partitioned table from intermediate / temp table:

    INSERT OVERWRITE TABLE employee_part PARTITION (SALARY) SELECT * FROM employee;