I need to create an external hive table on top of a csv file. CSV is having col1, col2, col3 and col4.
But my external hive table should be partitioned on month but my csv file doesn't have any month field. col1 is date field. How can I do this?
You need to reload data into partitioned table.
Create partitioned table (mytable_part)
create table mytable_part(
--columns specification here for col1, col2, col3, col4
)
partitioned by (part_month string) ...
stored as textfile --you can chose any format you need
Load data into partitioned table using dynamic partitioning, calculate partition column in the query:
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table mytable_part partition (part_month)
select
col1, col2, col3, col4,
substr(col1, 1, 7) as part_month --partition column in yyyy-MM format
from mytable
distribute by substr(col1, 1, 7) --to reduce the number of files
;
Try this way
Copy the csv data into a folder in HDFS location hdfs://somepath/5 and add that path to your external table as partition.
create external table ext1(
col1 string
,col2 string
,col3 string
,col4 string
)
partition by (mm int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS ORC;
alter table ext1 add partition(mm = 5) location 'hdfs://yourpath/5';