
I used this code to run the word count hadoop job. WordCountDriver runs when I run it from inside eclipse with the hadoop eclipse plugin. WordCountDriver also runs from the command line when I package the mapper and reducer classes as a jar and drop it in the classpath.

However, it fails if I try to run it from the command line without adding the mapper and reducer class as a jar to the classpath although I added both the classes to the classpath. I wanted to know is there some restriction in hadoop from accepting mapper & reducer classes as normal class files. Is creating a jar always mandatory ?

public class WordCountDriver extends Configured implements Tool {

public static final String HADOOP_ROOT_DIR = "hdfs://universe:54310/app/hadoop/tmp";

static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    private Text word = new Text();
    private final IntWritable one = new IntWritable(1);

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        String line = value.toString();
        StringTokenizer itr = new StringTokenizer(line.toLowerCase());
        while (itr.hasMoreTokens()) {
            context.write(word, one);

static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

        int sum = 0;

        for (IntWritable value : values) {
            sum += value.get(); // process value
        context.write(key, new IntWritable(sum));

public int run(String[] args) throws Exception {

    Configuration conf = getConf();

    conf.set("mapred.job.tracker", "universe:54311");

    Job job = new Job(conf, "Word Count");

    // specify output types

    // specify input and output dirs
    FileInputFormat.addInputPath(job, new Path(HADOOP_ROOT_DIR + "/input"));
    FileOutputFormat.setOutputPath(job, new Path(HADOOP_ROOT_DIR + "/output"));

    // specify a mapper

    // specify a reducer


    return job.waitForCompletion(true) ? 0 : 1;

 * @param args
 * @throws Exception
public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new Configuration(), new WordCountDriver(), args);



2 Answers


It's not entirely clear which classpath you're referring to, but in the end, if you're running on a remote Hadoop cluster, you need to provide all classes in a JAR file that is sent to Hadoop during the hadoop jar execution. The classpath of your local program is irrelevant.

It is probably working locally since you are actually running a Hadoop instance inside the local process there. So, in that case it happens to be able to find the classes in your local program's classpath.


Adding classes to the hadoop classpath will make them available client side (i.e. to your Driver).

Your mapper and reducer need to be available cluster-wide, and to make this easier on hadoop, you bundle them up into a jar and either supply with the Job.setJarByClass(..) class, or add them to the job classpath using the -libjars option with the GenericOptionsParser: