I have written a MapReduce job that works on some Protobuf files as input. Owing to the nature of the files (unsplittable), each file is processed by one mapper (implemented a custom FileInputFormat
with isSplitable
set to false
). The application works well with input file-sizes less than ~680MB
and produces the resulting files however, once the input file size crosses that limit, the application completes successfully but produces an empty file.
I'm wondering if I'm hitting some limit of file-size for a Mapper? If it matters, the files are stored on Google Storage (GFS) and not HDFS.
Thanks!