2
votes

I created a Hive table with Non-partition table and using select query I inserted data into Partitioned Hive table.

Refered site

  1. By following above link my partition table contains duplicate values. Below are the setps

This is my Sample employee dataset:link1

I tried the following queries: link2

But after updating a value in Hive table,

Updating salary of Steven with EmployeeID 19 to 50000.

INSERT OVERWRITE TABLE Unm_Parti_Trail PARTITION (Department = 'A') SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail;

the values are getting duplicated.

7       Nirmal  Tech    12000   A
7       Nirmal  Tech    12000   B

Nirmal is placed in Department A only but it is duplicated to department B.

Am I doing anything wrong?

Please suggest.

2
It seems like in your first INSERT to the new table all the row are selected. What does SELECT EmployeeID, FirstName,Designation,Salary FROM Unm_Dup_Parti_Trail WHERE department='A'; return?sfotiadis
@kabamaru:It return the 9 records without duplicatesUSB
Try to also select the department field on your select and see if this works for you. Eg: SELECT EmployeeID, FirstName,Designation,Salary, Department FROM Unm_Dup_Parti_Trail WHERE department='A';sfotiadis
Unm_Dup_Parti_Trail table does not contains any duplicate , duplicates are in partitioned table Unm_Parti_Trail.Unm_Dup_Parti_Trail table is created to load data into a partitioned table.I followed the above link.Dont know if it is the correct way.USB
@kabamaru: This is the link I refered blog.safaribooksonline.com/2012/12/03/…USB

2 Answers

2
votes

It seems like you forgot the WHERE clause in your last INSERT OVERWRITE:

INSERT INTO TABLE Unm_Parti_Trail PARTITION (Department = 'A') 
SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 
THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail 
WHERE department = 'A';
1
votes

One possible solution.

When you do the insert it is necessary to select the partitioning fields as the last ones on the query. Eg:

INSERT INTO TABLE Unm_Parti_Trail PARTITION(department='A') 
SELECT EmployeeID, FirstName,Designation,Salary, Department 
FROM Unm_Dup_Parti_Trail
WHERE department='A';

See this link for more info.