11
votes

I have tried a simple Map/Reduce task using Amazon Elastic MapReduce and it took just 3 mins to complete the task. Is it possible to re-use the same instance to run another task.

Even though I have just used the instance for 3 mins Amazon will charge for 1 hr, so I want to use the balance 57 mins to run several other tasks.

3
did we help to answer your question?Matthew Rathbone

3 Answers

14
votes

The answer is yes.

here's how you do it using the command line client:

When you create an instance pass the --alive flag, this tells emr to keep the cluster around after your job has run.

Then you can submit more tasks to the cluster:

elastic-mapreduce --jobflow <job-id> --stream --input <s3dir> --output <s3dir> --mapper <script1> --reducer  <script2>

To terminate the cluster later, simply run:

elastic-mapreduce <jobid> --terminate

try running elastic-mapreduce --help to see all the commands you can run.

If you don't have the command line client, get it here.

2
votes

Using:

elastic-mapreduce --jobflow job-id \
    --jar s3n://some-path/x.jar \
    --step-name "New step name" \
    --args ...

you can also add non-streaming steps to your cluster. (just so you don't have to try it your yourself ;-) )

0
votes

http://aws.amazon.com/elasticmapreduce/faqs/#dev-6

Q: Can I run a persistent job flow? Yes. Amazon Elastic MapReduce job flows that are started with the –alive flag will continue until explicitly terminated. This allows customers to add steps to a job flow on demand. You may want to use this to debug your job flow logic without having to repeatedly wait for job flow startup. You may also use a persistent job flow to run a long-running data warehouse cluster. This can be combined with data warehouse and analytics packages that runs on top of Hadoop such as Hive and Pig.