32
votes

I have orc table in hive I want to drop column from this table

ALTER TABLE table_name drop  col_name;

but I am getting the following exception

Error occurred executing hive query: OK FAILED: ParseException line 1:35 mismatched input 'user_id1' expecting PARTITION near 'drop' in drop partition statement

Can any one help me or provide any idea to do this? Note, I am using hive 0.14

7

7 Answers

40
votes

You cannot drop column directly from a table using command ALTER TABLE table_name drop col_name;

The only way to drop column is using replace command. Lets say, I have a table emp with id, name and dept column. I want to drop id column of table emp. So provide all those columns which you want to be the part of table in replace columns clause. Below command will drop id column from emp table.

 ALTER TABLE emp REPLACE COLUMNS( name string, dept string);
8
votes

There is also a "dumb" way of achieving the end goal, is to create a new table without the column(s) not wanted. Using Hive's regex matching will make this rather easy.

Here is what I would do:

-- make a copy of the old table
ALTER TABLE table RENAME TO table_to_dump;

-- make the new table without the columns to be deleted
CREATE TABLE table AS
SELECT `(col_to_remove_1|col_to_remove_2)?+.+`
FROM table_to_dump;

-- dump the table 
DROP TABLE table_to_dump;

If the table in question is not too big, this should work just well.

4
votes

suppose you have an external table viz. organization.employee as: (not including TBLPROPERTIES)

hive> show create table organization.employee;
OK
CREATE EXTERNAL TABLE `organization.employee`(
      `employee_id` bigint,
      `employee_name` string,
      `updated_by` string,
      `updated_date` timestamp)
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
    STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    LOCATION
      'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

You want to remove updated_by, updated_date columns from the table. Follow these steps:

create a temp table replica of organization.employee as:

hive> create table organization.employee_temp as select * from organization.employee;

drop the main table organization.employee.

hive> drop table organization.employee;

remove the underlying data from HDFS (need to come out of hive shell)

[nameet@ip-80-108-1-111 myfile]$ hadoop fs -rm hdfs://getnamenode/apps/hive/warehouse/organization.db/employee/*

create the table with removed columns as required:

hive> CREATE EXTERNAL TABLE `organization.employee`(
  `employee_id` bigint,
  `employee_name` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

insert the original records back into original table.

hive> insert into organization.employee 
select employee_id, employee_name from organization.employee_temp;

finally drop the temp table created

hive> drop table organization.employee_temp;
2
votes
ALTER TABLE emp REPLACE COLUMNS( name string, dept string);

Above statement can only change the schema of a table, not data. A solution of this problem to copy data in a new table.

Insert <New Table> Select <selective columns> from <Old Table> 
1
votes

ALTER TABLE is not yet supported for non-native tables; i.e. what you get with CREATE TABLE when a STORED BY clause is specified.

check this https://cwiki.apache.org/confluence/display/Hive/StorageHandlers

-1
votes

For external table its simple and easy. Just drop the table schema then edit create table schema , at last again create table with new schema. example table: aparup_test.tbl_schema_change and will drop column id steps:-

------------- show create table to fetch schema ------------------

spark.sql("""
show create table aparup_test.tbl_schema_change
""").show(100,False)

o/p:
CREATE EXTERNAL TABLE aparup_test.tbl_schema_change(name STRING, time_details TIMESTAMP, id BIGINT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'gs://aparup_test/tbl_schema_change'
TBLPROPERTIES (
  'parquet.compress' = 'snappy'
)
""")

------------- drop table --------------------------------

spark.sql("""
drop table aparup_test.tbl_schema_change
""").show(100,False)

------------- edit create table schema by dropping column "id"------------------

CREATE EXTERNAL TABLE aparup_test.tbl_schema_change(name STRING, time_details TIMESTAMP)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'gs://aparup_test/tbl_schema_change'
TBLPROPERTIES (
  'parquet.compress' = 'snappy'
)
""")

------------- sync up table schema with parquet files ------------------

spark.sql("""
msck repair table aparup_test.tbl_schema_change
""").show(100,False)

==================== DONE =====================================
-7
votes

Even below query is working for me.

Alter table tbl_name drop col_name