I have a mapreduce job that exports the plain text of an hbase table. I'm emulating the Export class that ships with hbase and not running any reducers. In addition, I'm just writing an empty String for the key. Something like this:
public void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
List<Cell> cells = value.listCells();
for(Cell cell : cells) {
context
.write(new Text(""), new Text(CellUtil.cloneValue(cell)));
}
}
This works fine, except I'm at the mercy of however many splits there are in the hbase table with regard to the number of output map files (e.g. part-m-NNNNN).
Is there a way to combine the output map files in the mapreduce job?
I've considered using a random integer between 1-50 for the key and then using a reducer that then strips the key before writing out to HDFS, but this seems like a hack.