1
votes

In Hive using ORC file format, it there a way to map the column names to the Hive table columns independent of the column order?

Example schema of orc file:

root
 |-- age: string (nullable = true)
 |-- first_name: string (nullable = true)
 |-- last_name: string (nullable = true)
 |-- salary: string (nullable = true) 
 |-- load_date: date (nullable = true) 

Desired Hive table column order:

first_name
last_name
age
salary

But when I create the table with desired column order, the data doesn't line up correctly. Hive is taking the data based on the column order, instead of column name. This works fine with Parquet format, where Hive is able to pick up the correct order based on the column name. Any suggestions to solve this wil be highly appreciated.

I dont have the option of changing the orc file schema as its created by an external process. Also dont want to create any intermediate temp table and insert overwrite option as well.

Hive Table DDL:

create external table if not exists test.user_profile(
first_name String,
last_name String,
age String,
salary String)
PARTITIONED BY (load_date String)
stored as orc
Location '/test/user_profile_data/';
1

1 Answers

0
votes

Schema evolution is supported in ORC from Hive 2.1. From Hive 2.3.x You can do both - positional evolution and schema evolution by setting this property orc.force.positional.evolution Note that default value for the said property is false