Get the input path in a Hadoop Mapper Class

Question

I have implemented a simple MapReduce project in Hadoop for processing logs. The input path is the directory where the logs are.

It works fine but I would like to know how the input path of the log is being processed at any time in the class which implements the Mapper. The Mapper code is:

public class StatsMapper extends MapReduceBase implements Mapper<WritableComparable<Text>,Text,Text,Text> { 

    public static final Log LOG = LogFactory.getLog(StatsMapper.class);

    public void configure(JobConf conf) {}

    public void map(WritableComparable<Text> key, Text value, OutputCollector<Text,Text> output, Reporter reporter)
            throws IOException {

        process(key,value);

    }

}

Any idea?

Thanks in advance

What means 'how the input path of the log is being processed'? — Thomas Jungblut

yura yura · Accepted Answer · 2011-03-07T22:31:07

Read the InputFormat section here

How these input files are split up and read is defined by the InputFormat. An InputFormat is a class that provides the following functionality: Selects the files or other objects that should be used for input Defines the InputSplits that break a file into tasks Provides a factory for RecordReader objects that read the file

Get the input path in a Hadoop Mapper Class

1 Answers