i'm currently running MSCK HIVE REPAIR SCHEMA.TABLENAME
for all my tables after data is loaded.
As the partitions are growing, this statement is taking much longer (some times more than 5 mins) for one table. I know it scans and parses through all partitions in s3 (where my data is) and then adds the latest partitions into hive metastore.
I want to replace MSCK REPAIR with ALTER TABLE ADD PARTITION statement. MSCK REPAIR works perfectly fine with adding latest partitions, however i'm facing problem with TIMESTAMP value in the partition when using ALTER TABLE ADD PARTITION.
I have a table with four partitions (part_dt STRING, part_src STRING, part_src_file STRING, part_ldts TIMESTAMP)
.
After running **MSCK REPAIR, the SHOW PARTITIONS command gives me below output
hive> show partitions hub_cont;
OK
part_dt=20181016/part_src=asfs/part_src_file=kjui/part_ldts=2019-05-02 06%3A30%3A39
But, when i drop the above partition from metastore, and recreate it using ALTER TABLE ADD PARTITION
hive> alter table hub_cont add partition(part_dt='20181016',part_src='asfs',part_src_file='kjui',part_ldts='2019-05-02 06:30:39');
OK
Time taken: 1.595 seconds
hive> show partitions hub_cont;
OK
part_dt=20181016/part_src=asfs/part_src_file=kjui/part_ldts=2019-05-02 06%3A30%3A39.0
Time taken: 0.128 seconds, Fetched: 1 row(s)
It is adding .0 at the end of timestamp value. When i query the table for this partition, it is giving me 0 records.
Is there way to add parition that has timestamp value without getting this zero added at the end. I'm unable to figure out how MSCK REPAIR is handling this case that is ALTER TABLE statement not able to.