Request:- How can I insert partition key pair into each parquet
file while inserting data into Hive/Impala
table.
Hive Table DDL [ create external table db.tbl_name ( col1 string, col2 string) Partitioned BY (date_col string) STORED AS parquet LOCATION 'hdfs_path/db/tbl_name' ]
Let's insert data into this hive table.
INSERT INTO db.tbl_name PARTITION (date_col=2020-07-26) VALUES ('test1_col1','test1_col2')
Once records get inserted, let's view data into parquet file using parquet-tools or any other tool.
parquet-tool cat hdfs_path/db/tbl_name/date_col=2020-07-26/parquet_file.parquet
Below would be the view.
**********************
col1 = test1_col1
col2 = test1_col2
**********************
However, if I fire following HQL query on Hive/Impala, then it will read partition value from metadata.
**Query**- select * from db.tbl_name
**Result** -
col1 col2 date_col
test1_col1 test1_col2 2020-07-26
Question- Is there any way, where we can view partition columnn name and value in parquet file like below.
col1 = test1_col1
col2 = test1_col2
date_col = 2020-07-26
CREATE TABLE
statement and aDATA SAMPLE
to figure out what is happening. – Chemahdfs://.../<table-name>/<partition-column>=<value>/part-xxx-file.parquet
. – mazaneichadata_col_2
as a non-partition column, and have it contain the same date asdata_col
. There is no other way afaik. – mazaneicha