I have a scenario where I process 1000's of small files using Hadoop. The output of the Hadoop job is then to be used as input for a non-Hadoop algorithm. In the current workflow, data is read, converted to Sequence files, processed and resulting small files are then outputted to HDFS in the form of Sequence file. However, non-Hadoop algorithm cannot understand Sequence File. Therefore, I've written another simple Hadoop job to read resulting files' data from Sequence File and create final small files that can be used by the non-Hadoop algorithm.
The catch here is that for the final job I have to read Sequence Files from HDFS and write to Local file system of each node to be processed by non-Hadoop algorithm. I've tried setting the output path as file:///<local-fs-path> and using Hadoop LocalFileSystem class. However, doing so outputs the final results to namenode's local file system only.
Just to complete the picture, I have 10 nodes Hadoop setup with Yarn. Is there a way in Hadoop Yarn mode to read data from HDFS and write results to local file system of each processing node?
Thanks