0
votes

I'm looking for a way to modify a parquet data table in HIVE to remove some fields. The table is managed but it doesn't matter because I can convert it to external.
The problem is that I can not use the command ALTER TABLE ... REPLACE COLUMN with partitioned parquet tables.

It is works well for textfile format (partitioned or not) and only for non-partitioned parquet tables.

I've tried to replace column but this is the result:

hive> ALTER TABLE db_test.mytable REPLACE COLUMNS(name String);
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
Replacing columns cannot drop columns for table db_test.mytable. 
SerDe may be incompatible

I've thought about some solutions, but none of them fits my scenario:

First
- [Optional] Convert the table in external.
- Delete the table.
- Re-create the table with the fields that I want.
- MSCK REPAIR TABLE to add HDFS partitions.
- [Optional] Convert back to managed table.

Second
- Create temporary table as selection of the original table with the fields that I choose.
- Delete the original table.
- Rename the temporary table to the original name.

Both options affect my process because I would lose the statistics of my table. This table is consumed with MicroStrategy by Impala and I need to mantain the statistics.
In addition, the second solution is bad with very large data tables.

   

Any suggestions?
Thanks in advance.

1

1 Answers

0
votes

You can use first method and then run

hive> anayze table <db_name>.<table_name> compute statistics;

to compute all the statistics of the table.