In the Hive documentation it talks about dynamic partitioning for external tables: https://cwiki.apache.org/confluence/display/Hive/HCatalog+DynamicPartitions.
I've recently upgraded to Hive 0.12 and want to use the dynamic external table partitioning by giving the root HDFS location of a table in the location part of the DDL, then adding subdirectories which will then be automatically added to the table. The subdirectories will be created by a Flume agent which will add the date as the path name. I'd like the Hive table sitting on top of the root directory to be able to pick up new data in the subdirectories automatically.
For example, if the root location is:
hdfs:///partitionTest
and I add data with Flume using the path
hdfs:///partitionTest/year=%Y/month=%m/day=%d
Where my Hive DDL statement is:
create external table partitionTest (line String)
partitioned by (year int, month int, day int)
stored as sequencefile
location 'hdfs:///partitionTest';
I still find myself having to add partitions manually using an alter table statement or inserting data into this table from another table containing the relevant partitioning fields - which does write data into the correct directory structure on the HDFS.
Am I wrong in thinking that Hive 0.12 will allow me to write data into the HDFS and will then automatically add it to the table? Will I always have to add partitions manually for Hive to pick up the new data?