So, I have a table that data partitioned by datetime(dt) and stored in S3 which the partition look like this
dt=2019-03-22/
dt=2019-03-23/
dt=2019-03-24/
and so on, What I wanted to do is to change how I partition data from this pattern into a subpartition like this
year=2019/month=03/day=22/
year=2019/month=03/day=23/
year=2019/month=03/day=24/
But I don't want to alter the original table so I created an external table that point to another location in S3 which will be the location for this new partition pattern. I have tried creating a table that point to that location using (the same schema as the original one)
CREATE EXTERNAL TABLE `test_partition_new`(
`order_id` string,
`outlet_code` string,
.
.
.
.
`business_date` string,
.
.
.
.
)
PARTITIONED BY (
`year` string,
`month` string,
`day` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
's3://data-test/test_partition/db.new_partition/'
TBLPROPERTIES (
'orc.compress'='SNAPPY',
)
which will partition by year, month and day respectively. So from what I understand I should insert data from the original table into this one. How should I insert data into this new table which a date to be partition by are from column 'business_date' that contain data like '2019-03-20'. Are there any function that can separate this column into three column containing year, month and day