I am working with Hive's dynamic partitions, and I am having an issue with a column not being populated with data unless a new partition is added. I have created a small demo example to demonstrate.
1. create table vegetables (name string, count bigint) partitioned by (year int, month int, day int); 2. create external table dataForVeg (name string, count bigint, weight string, year int, month int, day int) row format delimited fields terminated by ' '; 3. load data1 into dataforveg 4. set hive.exec.dynamic.partition.mode=nonstrict; 5. insert into table vegetables partition(year, month, day) select name, count, year, month, day from dataforveg; 6. hive> select * from vegetables where day='5'; tomato 5 2013 11 5 cabbage 3 2013 11 5 7. hive> alter table vegetables add columns(weight double); 8. hive> describe vegetables ; name string count bigint weight double year int month int day int 9. hive> select * from vegetables where day='5'; tomato 5 NULL 2013 11 5 cabbage 3 NULL 2013 11 5 hive> select * from vegetables where day='4'; potato 2 NULL 2013 11 4 10. load overwrite data2 into dataforveg 11. hive> select * from dataforveg; carrot 10 5 2013 11 5 pepper 15 2 2013 11 5 12. hive> select * from vegetables where day='5'; tomato 5 NULL 2013 11 5 cabbage 3 NULL 2013 11 5 carrot 10 NULL 2013 11 5 pepper 15 NULL 2013 11 5 13. load overwrite data3 into dataforveg hive> select * from dataforveg; beet 4 1 2013 11 6 broccoli 3 1 2013 11 6 14. hive> select * from vegetables; potato 2 NULL 2013 11 4 tomato 5 NULL 2013 11 5 cabbage 3 NULL 2013 11 5 carrot 10 NULL 2013 11 5 pepper 15 NULL 2013 11 5 beet 4 1.0 2013 11 6 broccoli 3 1.0 2013 11 6
As you can see from the example, data is being updated when you add a new partition. Question: is there a way to refresh the value of a new field 'weight' for carrot and pepper in step 12? In other words, is there a way to fill the newly added columns with data in an already existing partition?