How to read the Hadoop Sequentil file as an input to the Hadoop job?

Question

I have a Sequential file which has the key-value pair of type "org.apache.hadoop.typedbytes.TypedBytesWritable" , I have to provide this file as the input to the Hadoop job and have to process it in map only. I mean i dont have to do anything which will need reduce.

1) How will i specify the FileInputFormat as SequentialFile ?

2) What will be the signature of map function.

3) How will i get output from map instead of Reduce?

Praveen Sripati Praveen Sripati · Accepted Answer · 2012-01-11T14:26:10

1) How will i specify the FileInputFormat as SequentialFile ?

Set the SequenceFileAsBinaryInputFormat as the input format. Here is the code for the SequenceFileAsBinaryInputFormat class.

Here is the code

JobConf conf = new JobConf(getConf(), getClass());
conf.setInputFormat(SequenceFileAsBinaryInputFormat.class);

2) What will be the signature of map function.

The map would be invoked with a BytesWritable as key and value types.

3) How will i get output from map instead of Reduce?

Set the mapred.reduce.tasks property to 0. The output of the map will be the final output of the job.

Also, take a look at the SequenceFileAsTextInputFormat. The map would be invoked with Text as key and value types.

How to read the Hadoop Sequentil file as an input to the Hadoop job?

1 Answers