0
votes

I have sqoopd data from Netezza table and output file is in HDFS, but one column is a timestamp and I want to load it as a date column in my hive table. Using that column I want to create partition on date. How can i do that?

Example: in HDFS data is like = 2013-07-30 11:08:36

In hive I want to load only date (2013-07-30) not timestamps. I want to partition on that column DAILY.

How can I pass partition by column as dynamically?

I have tried with loading data into one table as source. In final table I will do insert overwrite table partition by (date_column=dynamic date) select * from table1

2

2 Answers

0
votes

Set these 2 properties -

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

And the Query can be like -

INSERT OVERWRITE TABLE TABLE PARTITION (DATE_STR)
SELECT 
  :
  :
 -- Partition  Col is the last column 
  to_date(date_column) DATE_STR
FROM table1;

You can explore the two options of hive-import - if it is an incremental import you will be able to get the current day's partition.

--hive-partition-key    
--hive-partition-value 
0
votes

You can just load the EMP_HISTORY table from EMP by enabling dynamic partition and converting the timestamp to date using to_date date function

The code might look something like this....

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

INSERT OVERWRITE TABLE EMP_HISTORY PARTITION (join_date)
    SELECT e.name as name, e.age as age, e.salay as salary, e.loc as loc, to_date(e.join_date) as join_date from EMP e ;