Issue while updating hive table

Question

I am trying to update records into hive table. I am creating hive table from another existing hive table using following syntax (This is how we need to create table into automated script):

CREATE TABLE employee_master STORED as ORC as SELECT * FROM EMPLOYEE_SAMPLE;
ALTER TABLE employee_master clustered by (employeeid) into 2 buckets;
ALTER TABLE employee_master SET TBLPROPERTIES ('transactional'='true');`

After running above commands, employee_master table get created with ACID supports as following: I loaded 100 records from another similar table. 10 sample records from the table are as following:

"employeeid","name","jobtitle","department","annualsalary"
"100","ADAMCZYK LAWRENCE M","FIREFIGHTER","FIRE","79926"
"52","ACCIARI NICHOLAS B","MOTOR TRUCK DRIVER","STREETS & SAN","71469"
"26","ABRAHAM DALINA D","COMMUNITY SERVICES","FAMILY & SUPPORT","69684"
"54","ACEVEDO AARON F","POLICE OFFICER","POLICE","60918"
"14","ABDOLLAHZADEH ALI","PARAMEDIC","FIRE","64374"
"56","ACEVEDO BIENVENIDO","POLICE OFFICER","POLICE","77238"
"28","ABRAHAM GODWIN K","TECHNICIAN V","BUSINESS AFFAIRS","84888"
"58","ACEVEDO DAILHA","DISEASE CONTROL INVESTIGATOR II","HEALTH","48108"
"8","ABBOTT BETTY L","FOSTER GRANDPARENT","FAMILY & SUPPORT","2756"
"60","ACEVEDO EDWIN C","FIREFIGHTER","FIRE","83148"

When I ran select * on the table, I can see all 100 records and select count (*) command also showing count as 100, i.e. table contains 100 records.

Directory structure for the employee_master table is shown as following:

Now I want to update 5 records from above sample records one by one using following queries:

Update employee_master SET department="FIRE DEPARTMENT" where employeeid=100;
Update employee_master SET department="STREETS & SAN DEPARTMENT" where employeeid=52;
Update employee_master SET department="FAMILY & SUPPORT SERVICES" where employeeid=26;
Update employee_master SET department="POLICE DEPARTMENT" where employeeid=54;
Update employee_master SET department="FIRE DEPARTMENT" where employeeid=14;

While updating employee_master table as per the HIVE transaction documents, one delta file is created corresponding to one row update. After updating all above records, file structure for the employee_master table is shown as following:

While running, count () and statistics command, I still see 100 records as output. After Major Compaction ran (manfully or automatically) on the table, I can see only 11 records in the table as an output of select () command. Means, when I run select () from employee_master, I can see only 11 records as following while which is not correct. Output should show all 100 records as before updating the table; While running count() and statistics command as below; is showing count as 100 records as following:

I have tried this exercise numerous times in the same way on different- different tables/servers but all showing same unexpected output. Not sure if I am missing something or this is a bug.

NOTE: I tried once creating ORC transactional table using create table syntax as following:

CREATE IF NOT EXISTS employee_master (
    employeeid INT,
    name string,
    jobtitle string,
    department string,
    annualsalary DECIMAL(10,2))
COMMENT "Employee Master - Table"
CLUSTERED BY (employeeid) INTO 2 BUCKETS
STORED AS ORC 
TBLPROPERTIES ("transactional"="true")

Then I did not face above issue but this is not fitting into our current requirement.

I am using following software versions:

HDP-2.4.0.0-169
Hive 1.2.1000
Centos 6.8
Running Hive on TEZ client

Jaime Caffarel Jaime Caffarel · Accepted Answer · 2016-10-23T15:59:41

Would updating Hive to >= 1.3 be an option for you? According to the documentation:

Schema changes using ALTER TABLE is NOT supported for ACID tables. HIVE-11421 is tracking it. Fixed in 1.3.0/2.0.0.

Since you are using Hive 1.2.100 and you are trying to execute the ALTER TABLE commands, I think the issues you're having are related to that.

Issue while updating hive table

2 Answers