1
votes

I am trying to run hadoop with streaming. I have two files . One is the java file for mapper and other is the python script for the reducer.

MerkleMapper.java

Class MerkleMapper extends MapREduceBase and defines map() function. For each record of the input split it reads the incoming key(byte_offset) , value(line) pair and outputs the byte_offset and hash of the line.

The Reducer is a python script which combines all the hashes and produces a top hash.

Is it possible to combine the two (java and python). How can i specify my java file as mapper using Streaming.

1

1 Answers

0
votes

You could split it into 2 jobs.

First job has only a mapper (your Java mapper) and you take the output of that and pass it into a python streaming job, where your Mapper is the identity mapper and your reducer is the python Reducer. Currently, you cannot combine streaming and java from what i know.