Is it possible to execute a Hadoop Streaming job that has no input file?
In my use case, I'm able to generate the necessary records for the reducer with a single mapper and execution parameters. Currently, I'm using a stub input file with a single line, I'd like to remove this requirement.
We have 2 use cases in mind.
1)
- I want to distribute the loading of files into hdfs from a network location available to all nodes. Basically, I'm going to run ls in the mapper and send the output to a small set of reducers.
- We are going to be running fits leveraging several different parameter ranges against several models. The model names do not change and will go to the reducer as keys while the list of tests to run is generated in the mapper.