2
votes

Hive dynamic partition in insert overwrite from select statement is not loading the data for the dynamic partition instead it gives the data HIVE_DEFAULT_PARTITION.

If I say show partitions table2;

it just give only one partition detail which is HIVE_DEFAULT_PARTITION

I have a staging table without any partitions which reads data from a sequence file.

CREATE EXTERNAL TABLE IF NOT EXISTS table1(DS string, col1 string, col2 string, col3 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS SEQUENCEFILE LOCATION '/user/' tblproperties ("skip.header.line.count"="1");

The date format is m/d/yy for the field DS which is coming from the file.

Then I am creating production table like below.

CREATE EXTERNAL TABLE IF NOT EXISTS table2(col1 string, col2 string, col3 string)
PARTITIONED BY (DS string) STORED AS SEQUENCEFILE tblproperties ("skip.header.line.count"="1");

set hive.exec.dynamic.partition.mode=nonstrict;

Then the insert query as below

INSERT OVERWRITE TABLE table2 PARTITION (DS) SELECT col1, col2,col3, cast(DS as date) FROM table1;

When I query the table 2 I get the result as below

d1 d2 d3 HIVE_DEFAULT_PARTITION d4 d5 d5 HIVE_DEFAULT_PARTITION

for the last column where I expect a date field from my partition column instead I get HIVE_DEFAULT_PARTITION

My expected data is like

d1 d2 d3 5/1/17 d4 d5 d5 5/1/17

Can anyone help on this?

1

1 Answers

1
votes

The only format you can cast directly to date is yyyy-mm-dd any attempt to cast from another format results in NULL, therefore all your records go to the default partition.

  • Define the partition column as DATE (PARTITIONED BY (DS date))
  • Use to_date(from_unixtime(to_unix_timestamp(DS,'M/d/y'))) for conversion

Demo

hive> select cast('5/1/17' as date) as ds;
OK
ds
NULL

hive> select cast('2015-05-01' as date) as ds;
OK
ds
2015-05-01

hive> select to_date(from_unixtime(to_unix_timestamp('5/1/17','M/d/y'))) as ds;
OK
ds
2017-05-01