1
votes

I need to create an external hive table on top of a csv file. CSV is having col1, col2, col3 and col4.

But my external hive table should be partitioned on month but my csv file doesn't have any month field. col1 is date field. How can I do this?

2

2 Answers

1
votes

You need to reload data into partitioned table.

  1. Create non-partitioned table (mytable) on top of folder with CSV.
  2. Create partitioned table (mytable_part)

    create table mytable_part( --columns specification here for col1, col2, col3, col4 ) partitioned by (part_month string) ... stored as textfile --you can chose any format you need

  3. Load data into partitioned table using dynamic partitioning, calculate partition column in the query:

    set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;

    insert overwrite table mytable_part partition (part_month) select col1, col2, col3, col4, substr(col1, 1, 7) as part_month --partition column in yyyy-MM format from mytable distribute by substr(col1, 1, 7) --to reduce the number of files ;

1
votes

Try this way

Copy the csv data into a folder in HDFS location hdfs://somepath/5 and add that path to your external table as partition.

create external table ext1(
    col1   string
    ,col2  string
    ,col3  string
    ,col4  string
)
partition by (mm int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS ORC;

alter table ext1 add partition(mm = 5) location 'hdfs://yourpath/5';