How to insert Hive partition column and value into data (parquet) file?

Question

Request:- How can I insert partition key pair into each parquet file while inserting data into Hive/Impala table.

Hive Table DDL [ create external table db.tbl_name ( col1 string, col2 string) Partitioned BY (date_col string) STORED AS parquet LOCATION 'hdfs_path/db/tbl_name' ]

Let's insert data into this hive table.

INSERT INTO db.tbl_name PARTITION (date_col=2020-07-26) VALUES ('test1_col1','test1_col2')

Once records get inserted, let's view data into parquet file using parquet-tools or any other tool.

parquet-tool cat hdfs_path/db/tbl_name/date_col=2020-07-26/parquet_file.parquet

Below would be the view.

**********************
col1 = test1_col1 
col2 = test1_col2
**********************

However, if I fire following HQL query on Hive/Impala, then it will read partition value from metadata.

**Query**- select * from db.tbl_name
**Result** -
col1        col2        date_col
test1_col1  test1_col2  2020-07-26

Question- Is there any way, where we can view partition columnn name and value in parquet file like below.

col1 = test1_col1 
col2 = test1_col2 
date_col = 2020-07-26

Hi @HappyCoder, You should show us the CREATE TABLE statement and a DATA SAMPLE to figure out what is happening. — Chema
Partition columns are not supposed to be inside parquet files, they are persisted externally as HDFS directories, e.g. hdfs://.../<table-name>/<partition-column>=<value>/part-xxx-file.parquet. — mazaneicha
Hi @mazaneicha - I want them inside parquet file as well. How can I achieve this ? Is there any property behind this ? — Peace_Dude
You'll need to add data_col_2 as a non-partition column, and have it contain the same date as data_col. There is no other way afaik. — mazaneicha

Koushik Roy Koushik Roy · Accepted Answer · 2020-08-10T14:26:15

Please use this -
INSERT INTO db.tbl_name PARTITION (date_col) VALUES ('test1_col1','test1_col2','2020-07-26');

Always mention partition name inside brackets() like above. And then in the values/select clause, order the partition column in the end.
Thats all you need to insert into hive/impala partitioned table.

How to insert Hive partition column and value into data (parquet) file?

1 Answers