Adding partitions to Hive from a MapReduce Job

Question

I am new to Hive and MapReduce and would really appreciate your answer and also provide a right approach.

I have defined an external table logs in hive partitioned on date and origin server with an external location on hdfs /data/logs/. I have a MapReduce job which fetches these logs file and splits them and stores under the folder mentioned above. Like

"/data/logs/dt=2012-10-01/server01/"
"/data/logs/dt=2012-10-01/server02/"
...
...

From MapReduce job I would like add partitions to the table logs in Hive. I know the two approaches

alter table command -- Too many alter table commands
adding dynamic partitions

For approach two I see only examples of INSERT OVERWRITE which is not an options for me. Is there a way to add these new partitions to the table after the end of the job?

Charles Menguy Charles Menguy · Accepted Answer · 2013-01-11T22:24:07

To do this from within a Map/Reduce job I would recommend using Apache HCatalog, which is a new project stamped under Hadoop.

HCatalog really is an abstraction layer on top of HDFS so you can write your outputs in a standardized way, be it from Hive, Pig or M/R. Where this comes into the picture for you, is that you can directly load data in Hive from your Map/Reduce job using the output format HCatOutputFormat. Below is an example taken from the official website.

A current code example for writing out a specific partition for (a=1,b=1) would go something like this:

Map<String, String> partitionValues = new HashMap<String, String>();
partitionValues.put("a", "1");
partitionValues.put("b", "1");
HCatTableInfo info = HCatTableInfo.getOutputTableInfo(dbName, tblName, partitionValues);
HCatOutputFormat.setOutput(job, info);

And to write to multiple partitions, separate jobs will have to be kicked off with each of the above.

You can also use dynamic partitions with HCatalog, in which case you could load as many partitions as you want in the same job !

I recommend reading further on HCatalog on the website provided above, which should give you more details if needed.

Adding partitions to Hive from a MapReduce Job

3 Answers