I've been running several map reduce jobs on a hadoop cluster from a single JAR file. The Main of the JAR accepts an XML file as a command line parameter. The XML file contains the input and output paths for each job (name-value property pairs) and I use these to configure each mapreduce job. I'm able to load the paths into the Configuration like so
Configuration config = new Configuration(false);
config.addResource(new FileInputStream(args[0]));
I am now trying to run the JAR using Amazon's Elastic MapReduce. I tried uploading the XML file to S3 but of course using FileInputStream to load the paths data from S3 doesn't work (FileNotFound Exception).
How can I pass the XML file to the JAR when using EMR?
(I looked at bootstrap actions but as far as I can tell that's to specify hadoop-specific configurations).
Any insight would be appreciated. Thanks.