0
votes

In the Hive documentation it talks about dynamic partitioning for external tables: https://cwiki.apache.org/confluence/display/Hive/HCatalog+DynamicPartitions.

I've recently upgraded to Hive 0.12 and want to use the dynamic external table partitioning by giving the root HDFS location of a table in the location part of the DDL, then adding subdirectories which will then be automatically added to the table. The subdirectories will be created by a Flume agent which will add the date as the path name. I'd like the Hive table sitting on top of the root directory to be able to pick up new data in the subdirectories automatically.

For example, if the root location is:

hdfs:///partitionTest

and I add data with Flume using the path

hdfs:///partitionTest/year=%Y/month=%m/day=%d

Where my Hive DDL statement is:

create external table partitionTest (line String)  
partitioned by (year int, month int, day int) 
stored as sequencefile 
location 'hdfs:///partitionTest';

I still find myself having to add partitions manually using an alter table statement or inserting data into this table from another table containing the relevant partitioning fields - which does write data into the correct directory structure on the HDFS.

Am I wrong in thinking that Hive 0.12 will allow me to write data into the HDFS and will then automatically add it to the table? Will I always have to add partitions manually for Hive to pick up the new data?

2

2 Answers

0
votes

If your table is external and if it is static partition then yes you have to add the partitions(using alter command) every time a new partition is created.

If it a internal table and if it is a static partition then you have to either use the LOAD command or use insert into command.

If it is a dynamic partition then the last column in your select query will be the partition value. Till Hive 0.13 this is the procedure.

Hope it helps...!!!

0
votes

In case of manual addition of partition outside / not via hive command, run the repair table command to pick up partition. See if this does the trick

MSCK REPAIR TABLE table_name; or ALTER TABLE table_name RECOVER PARTITIONS;