1
votes

Currently I have a Partitioned ORC "Managed" (Wrongly created as Internal first) Hive table in Prod with atleast 100 days worth of data partitioned by year,month,day(~16GB of data).

This table has roughly 160 columns.Now my requirement is to Add a column in the middle of this table and still be able to query the older data(partitioned files).Its is fine if the newly added column shows null for the old data.

What I did so far ?

1)First convert the table to External using below to preserve data files before dropping

alter table <table_name> SET TBLPROPERTIES('EXTERNAL'='TRUE');

2)Drop and Recreate the table with new column in the middle and then Altered the table to add the partition file

However I am unable to read the table after Recreation .I get this Error message

[Simba][HiveJDBCDriver](500312) Error in fetching data rows: *org.apache.hive.service.cli.HiveSQLException:java.io.IOException: java.io.IOException: ORC does not support type conversion from file type array<string> (87) to reader type int (87):33:32;

Any other way to accomplish this ?

1

1 Answers

4
votes

No need to drop and recreate the table. Simply use the following statement.

ALTER TABLE default.test_table ADD columns (column1 string,column2 string) CASCADE; 

ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata.

PS - This will add new columns to the end of the existing columns but before the partition columns. Unfortunately, ORC does not support adding columns in the middle as of now.

Hope that helps!