0
votes

I am trying to write map-reduce job that calculates distribution of field values in Hive table (Hadoop 2.2.0.2.0.6.0-101). For example:

Input Hive table "ATable":

+------+--------+
! name | rating |   |
+------+--------+
| Bond |  7     |
| Megre|  2     |
! Holms|  11    |
| Puaro|  7     |
! Holms|  1     |
| Puaro|  7     |
| Megre|  2     |      
| Puaro|  7     |
+------+--------+

Map-reduce job should generate the following output table also in Hive:

+--------+-------+--------+
| Field  | Value |  Count |
+--------+-------+--------+
| name   | Bond  |   1    |
| name   | Puaro |   3    |
| name   | Megre |   2    |
| name   | Holms |   1    |
| rating | 7     |   4    |
| rating | 11    |   1    |
| rating | 1     |   1    |
| rating | 2     |   2    |
+--------+-------+--------+

To get field name/values I need to get access to HCatalog metadata, so I could use these in map method (org.apache.hadoop.mapreduce.Mapper) For this I am trying to adopt example from: http://java.dzone.com/articles/mapreduce-hive-tables-using

The code from this example compiles but produce a lot deprecation warnings:

protected void map(WritableComparable key, HCatRecord value,
 org.apache.hadoop.mapreduce.Mapper.Context context)
 throws IOException, InterruptedException {

 // Get table schema
 HCatSchema schema = HCatBaseInputFormat.getTableSchema(context);

 Integer year = new Integer(value.getString("year", schema));
 Integer month = new Integer(value.getString("month", schema));
 Integer DayofMonth = value.getInteger("dayofmonth", schema);

 context.write(new IntWritable(month), new IntWritable(DayofMonth));
}

Deprecation warnings:

HCatRecord
HCatSchema 
HCatBaseInputFormat.getTableSchema

Where to look for a similar example of using HCatalog in map-reduce with latest, not deprecated interfaces?

Thanks!

1

1 Answers

0
votes

I used the example given in one of Cloudera examples and used the framework given at this blog to compile my code. I had to add the maven repo for hcatalog in the pom.xml as well. This example uses new mapreduce API's and not the deprecated mapred ones. Hope it helps.

        <dependency>
        <groupId>org.apache.hcatalog</groupId>
        <artifactId>hcatalog-core</artifactId>
        <version>0.11.0</version>
        </dependency>