I am having flat files being delivered to hdfs. The general structure of files are as below:
<header - 2 rows>
<data>
<footer - 1 row>
I am having an external hive table build on top of this dataset. Below is my hive ddl:
create external table ext_test
(
id string,
name string,
age string
) row format DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '<hdfs file location>'
TBLPROPERTIES ('skip.footer.line.count'='1', 'skip.header.line.count'='2')
When i query select * from ext_test in HIVE; i am getting the exact number of rows as expected from the external table. But when i run the exact same query in IMPALA, i receive one additional row which is basically the footer property.
e.g: The below example will clear out the scenario:
Sample Data in the Feed:
Header,Sample-03122018,
ID,NAME,AGE
1,R,10
2,RR,11
3,RRR,12
Footer,End of Report,
Output when i run select command in HIVE using the above DDL [Expected Output]:
1,R,10
2,RR,11
3,RRR,12
Output when i run select command in IMPALA using the above DDL: [Issue]
1,R,10
2,RR,11
3,RRR,12
Footer,End of Report,Null
Did you face similar issue? If yes, please advice me of the suitable solution.