2
votes

I have implemented a simple MapReduce project in Hadoop for processing logs. The input path is the directory where the logs are.

It works fine but I would like to know how the input path of the log is being processed at any time in the class which implements the Mapper. The Mapper code is:

public class StatsMapper extends MapReduceBase implements Mapper<WritableComparable<Text>,Text,Text,Text> { 

    public static final Log LOG = LogFactory.getLog(StatsMapper.class);

    public void configure(JobConf conf) {}

    public void map(WritableComparable<Text> key, Text value, OutputCollector<Text,Text> output, Reporter reporter)
            throws IOException {

        process(key,value);

    }

}

Any idea?

Thanks in advance

1
What means 'how the input path of the log is being processed'?Thomas Jungblut

1 Answers

2
votes

Read the InputFormat section here

How these input files are split up and read is defined by the InputFormat. An InputFormat is a class that provides the following functionality: Selects the files or other objects that should be used for input Defines the InputSplits that break a file into tasks Provides a factory for RecordReader objects that read the file