0
votes

I m using Single Node Cluster - Hadoop-2.7.0 in my Linum Machine. My code for WordCount Job is running fine with 1 reducer. But Not working fine if i increase the reducers. It is showing the following error:

15/05/25 21:15:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/05/25 21:15:10 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/05/25 21:15:10 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/05/25 21:15:10 WARN snappy.LoadSnappy: Snappy native library is available
15/05/25 21:15:10 INFO snappy.LoadSnappy: Snappy native library loaded
15/05/25 21:15:10 INFO mapred.FileInputFormat: Total input paths to process : 1
15/05/25 21:15:10 INFO mapred.JobClient: Running job: job_local_0001
15/05/25 21:15:11 INFO util.ProcessTree: setsid exited with exit code 0
15/05/25 21:15:11 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5f1fd699
15/05/25 21:15:11 INFO mapred.MapTask: numReduceTasks: 1
15/05/25 21:15:11 INFO mapred.MapTask: io.sort.mb = 100
15/05/25 21:15:11 INFO mapred.MapTask: data buffer = 79691776/99614720
15/05/25 21:15:11 INFO mapred.MapTask: record buffer = 262144/327680

15/05/25 21:15:11 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Illegal partition for am (1)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
    at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
    at WordMapper.map(WordMapper.java:24)
    at WordMapper.map(WordMapper.java:1)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

My getPartition Method Looks like this:

public int getPartition(Text key, IntWritable value, int numRedTasks) {
        String s = key.toString();
        if(s.length() == 1)
        {
            return 0;
        }
        else if(s.length() == 2)
        {
            return 1;
        }
        else if(s.length() == 3)
        {
            return 2;
        }
        else
            return 3;
    }

Run Method in WordCount.class File:

if(input.length < 2)
    {
        System.out.println("Please provide valid input");
        return -1;
    }
    else
    {
        JobConf config = new JobConf();
        FileInputFormat.setInputPaths(config, new Path(input[0]));
        FileOutputFormat.setOutputPath(config, new Path(input[1]));
        config.setMapperClass(WordMapper.class);
        config.setReducerClass(WordReducer.class);
        config.setNumReduceTasks(4);
        config.setPartitionerClass(MyPartitioner.class);
        config.setMapOutputKeyClass(Text.class);
        config.setMapOutputValueClass(IntWritable.class);
        config.setOutputKeyClass(Text.class);
        config.setOutputValueClass(IntWritable.class);
        JobClient.runJob(config);
    }
return 0;

}

My Mapper and Reducer Code is fine because Wordcount Job with 1 reducer is running fine. Any One able to figure it out?

2

2 Answers

0
votes

This may be due to pig fails in the operation due to high default_parallel could be set in it.

Thanks, Shailesh.

0
votes

You need to use tooRunner in your driver class and invoke the toolrunner in your main class. You can do this by using combiner as part of workflow. Below is the driver class code: As you can see from the code below, along with the mapper and reducer calls, there is a combiner call as well. And the exit code in the main runner is " int exitCode = ToolRunner.run(new Configuration(), new WordCountWithCombiner(), args);" which invokes tool runner at run time and you can specify the number of reducers or mappers you would like to use by using the "-D" option when running the wordcount program. A sample command line would look like "-D mapred.reduce.tasks =2 input output"

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
public class WordCountWithCombiner extends Configured
implements Tool{

@Override
public int run(String[] args) throws Exception {
Configuration conf = getConf(); 

Job job = new Job(conf, "MyJob");

job.setJarByClass(WordCount.class);
job.setJobName("Word Count With Combiners");

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

    return job.waitForCompletion(true) ? 0 : 1;

   }

  public static void main(String[] args) throws Exception {
    int exitCode = ToolRunner.run(new Configuration(), new WordCountWithCombiner(), args);
    System.exit(exitCode);
}

}