I am setting up lzo codec to use as the compression tool in my hadoop jobs. I know that lzo has the desirable feature of creating splittable files. But I have not found a way to get lzo create splittable files automatically. The blogs I have read so far all mention using indexer outside the job and feeding the output lzo file as the input to the mapreduce job.
I am using some hadoop benchmarks where I do not want to change the benchmark code, just use lzo compression in hadoop to see its effect on the benchmark. I am planning to use lzo as codec for compressing map output, but if the output is not splittable, the next phase will have to get the whole compressed output in the nodes to be able to work.
Any hadoop configuration option to instruct lzo to make the output files splittable, so it is transparently done?