0
votes

There is a HIVE table with around 100 columns, partitioned by columns ClientNumber and Date. I am trying to insert data from another HIVE table into only 30 columns as well as create Date partitions dynamically.

The issue is that all data gets loaded into "ClientNumber=123/date=__HIVE_DEFAULT_PARTITION__" partition which is not quite expected.

The script looks like this:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

FROM DBName.Table2
INSERT INTO TABLE DBName.Table1
PARTITION(ClientNumber=123, `Date`)   
(col1, col2, ..., col29, `Date`) 
SELECT 
col1, col2, ..., col29, eventDate as `Date`

Date values retrieved by the query are all correct, no NULL or other uncharacteristic values. Removing or adding Dynamic partitioning parameters makes no difference. Providing specific Date value (instead of dynamic query-populated value) results in creation of correct partitions.

Is there any trick I am missing or some issue with populating specific number of columns in partitioned tables?

P.S. Is listing all other columns (col31 - col100) in import script as NULL worth considering?

3
Update: 1. Date column is stated as partitioning field during table creation. Reserved keyword or alias for column is not an issue — works in other scenarios as well. 2. Listing all remaining columns using 'cast(NULL as string) as colN' definition can be considered workaround for this case. However, the script becomes much larger and I would still like to understand the issue with INSERT of specific fields.VasiliK

3 Answers

0
votes

If I get you question right, You should not have 'date' field in insert or select. If you are using aliase it may break requirments of partition rule. as it id expecting same field which is defined schema. the script should be ...

FROM DBName.Table2
INSERT INTO TABLE DBName.Table1
PARTITION(ClientNumber=123, Date)   
SELECT col1, col2, ..., col29, eventDate as Date
0
votes

Yes, there is a better way to supply many column names. You can use Hive's "Regex Column Specification": regular expressions for column names. Your query would be:

    SET hive.support.quoted.identifiers=none;
    FROM DBName.Table2
    INSERT INTO TABLE DBName.Table1
    PARTITION(ClientNumber=123, Date)   
    SELECT `(eventDate)?+.+`, eventDate as Date;

This means "select all names except eventDate from Table2".

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification

This is still clumsy for some use cases, and other places I've worked at wrote scripts to autogenerate Hive queries from table schemas.

0
votes

In case it could be usefull for someone it is worth to read the answer in Corrupt rows written to __HIVE_DEFAULT_PARTITION__ when attempting to overwrite Hive partition

I have spent some silly hours with the same issue before I have realized that there is limitation that the partitioned column should be on the end of the DF. In my case I have simply changed the SQL query for my DF in a way that the partitioned column is selected as last.