5
votes

I have a Hive table which was created by joining data from multiple tables. The data for this resides in a folder which has multiple files ("0001_1" , "0001_2", ... and so on). I need to create a partitioned table based on a date field in this table called pt_dt (either by altering this table or creating a new one). Is there a way to do this?

I've tried creating a new table and inserting into it (below) which did not work

create external table table2 (acct_id bigint, eval_dt string)
partitioned by (pt_dt string);
insert into table2
partition (pt_dt) 
select acct_id, eval_dt, pt_dt
from jmx948_variable_summary;

This throws the error

"FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 189 Cumulative CPU: 401.68 sec HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 6 minutes 41 seconds 680 msec"

2

2 Answers

10
votes

Was able to figure it out after some trial & error.

Enable dynamic partitioning in Hive:

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

Create schema for partitioned table:

CREATE TABLE table1 (id STRING, info STRING)
PARTITIONED BY ( tdate STRING);

Insert into partitioned table :

FROM table2 t2
INSERT OVERWRITE TABLE table1 PARTITION(tdate)
SELECT t2.id, t2.info, t2.tdate
DISTRIBUTE BY tdate;
0
votes

In the version I am working with below works (Hive 0.14.0.2.2.4.2-2)

INSERT INTO TABLE table1 PARTITION(tdate) SELECT t2.id, t2.info, t2.tdate

From the source table select the column that needs to be partitioned by last, in the above example, date is selected as the last column in Select. Similarly, if the one needs the table to be partitioned by the column "info", then

INSERT INTO TABLE table1 PARTITION(info) SELECT t2.id, , t2.tdate, t2.info

If you want to create the table with multiple partitions the select query needs to be i that order. If you want to partition the above table with "date" and then "info"

INSERT INTO TABLE table1 PARTITION(date, info) SELECT t2.id, , t2.tdate, t2.info

With "info", then "date"

INSERT INTO TABLE table1 PARTITION(info, date) SELECT t2.id, , t2.info, t2.tdate