it looks as if Hadoop handles compression transparently (when was this introduced, I don't remember it on 0.20.203) when using TextInputFormat
. Unfortunately, when using LZO compression, Hadoop doesn't use the LZO index file to make the file splittable. However, if I set the input format to com.hadoop.mapreduce.LzoTextInputFormat
, the file is split.
Is it possible to configure Hadoop to decompress LZO files and split them when using TextInputFormat
?