0
votes

I have created an external table in hive pointing to a gzip file

create external table IF NOT EXISTS raw_CN (
column1                        string,
column2                       string,
column3            string,
column4       string,
column5            string,
column6          string,
column7            string,
column8           string,
column9                        string,
column10        string

) PARTITIONED BY (day_id string, file_type string) row format delimited fields terminated by '|' STORED AS TEXTFILE;

Added the partition:

Alter table raw_CN add partition (day_id = '20140815' , file_type = 'Daily' ) location    '/mapr/mapr.cluster/CN/20140501/Daily';

Placed the gzip file at the above location

However when I query the table, the first row also gives me some file level information (there is no header in the file). How do I resolve this issue from the first row (rest of the rows are fine):

Vendor1_617_CN_Daily.201408150000664000202600020260243475554512373676764017202 0ustar  fworksfworks4F06C1A123456|82910|26|ESPN2|ESPN2|2014/08/15 01:09:42|2014/08/15     01:10:13|233|53066|Jefferson-Walworth (Jefferson), WI
123456|82910|8|WMLW|WMLW|2014/08/15 03:16:53||233|53066|Jefferson-Walworth (Jefferson), WI
123456|82910|3|WITI|WITI|2014/08/15 14:34:13|2014/08/15 14:35:20|233|53066|Jefferson-Walworth (Jefferson), WI
123456|82910|43|HGTV|Home & Garden Television (East)|2014/08/15 14:35:20|2014/08/15 14:37:00|233|53066|Jefferson-Walworth (Jefferson), WI
1
Hello! Did you solve the problem?dbustosp

1 Answers

1
votes

That depends on what version of Hive you are using.

For Hive version 13 and above:

There is a table property tblproperties ("skip.header.line.count"="1") which you can use while creating the table. So it will skip that no of lines.

For Hive Version 12 and below:

You need to remove the line/header manually or by using some shell/python script.

Hope it helps...!!!