19
votes

I am looking to build a web application which needs to run resource-intensive MCMC (Markov chain Monte Carlo) calculations on-demand in R to generate some probability graphs for the user.

Constraints:

  1. Obviously I don't want to run the resource-intensive calculations on the same server as the web app front-end, so these tasks need to be handed off to a worker instance.

  2. These calculations take a good amount of CPU to run and I'd like to keep latency as low as possible (hopefully seconds, not minutes), so I would prefer to run the calculations on beefier hardware.

  3. I cannot afford to run a beefy EC2 instance at ~66¢/hr x 24hrs/day, so on-demand or spot request instances are probably necessary.

Here are the options I've come up with:

  1. Run a cheap, affordable worker instance 24hrs a day which takes one task at a time managed by Amazon SWF (or SQS).

    Cons:

    • high latency - Cheaper hardware, longer wait times.



  2. Spawn a beefier worker instance per-task (spun up whenever a job is added to the queue) and terminate the instance upon completion.

    Cons:

    • expensive/wasteful - I'd be paying for an hour on the server each time and only using seconds for my calculation

    • startup overhead - Would spinning up a new EC2 instance on-demand introduce non-negligible latency (offsetting the whole purpose of utilizing beefier hardware)?



  3. Like #2 but with low-bid EC2 spot requests.

    Cons:

    • startup overhead - See #2

    • inconsistancy? - I've never worked with spot requests before, so I have no idea how volatile or hands-on such a solution would be... do I have to continually adjust my bids to make sure I can still get tasks done at peak hours? Also, I suppose I'd have to monitor my processes closely to make sure they aren't interrupted mid-calculation.



  4. Some kind of hybrid solution where I actively monitor beefy-hardware worker instances and their loads and intelligently spin up and terminate instances on the hour to maintain an optimal balance of cost and availability

    Cons:

    • complicated and costly setup - Unless there's a good managed service out there to handle stuff like this, I'd have to set all all of that infrastructure up myself...

I wish there was some service where I could pay for a highly-available on-demand hardware on a minute to minute basis rather than hourly.

So my questions are the following:

  • How would you recommend solving this problem?

  • Is there a good EC2 instance managing solution that could sit on top of Amazon SWF and help me load balance and terminate idle workers?

  • Would spot-request bids solve my problem or are they more suited to tasks which don't necessarily need to be completed right away?

2
If it is possible to implement your MCMC in Python, you could use Google App Engine. App Engine billing is mostly based on resources your application actually uses, instead of EC2's instance renting.jthetzel
Unfortunately, I'm pretty stuck on R... it handles most of the heavy lifting and the guys I'm working with are math people, not programmers, so it would fall to me to maintain it.mikegreiling
There is also the renjin project, which aims to be a JVM R interpreter. Eventually, you should be able to use it to run R on a Java App Engine, but I don't know how stable it is at the moment.jthetzel
Is it possible for you to prepare them in advance? Rather than on demand? (ie, scheduled release times of a battery of options)?Brandon Bertelsen
Unfortunately, no. The way the application works is the user will enter all of their data for their latest calculation in the Markov chain, submit a form, and expect a result. I could always give them a message saying check back in 15 minutes, but I was hoping for something more immediate. I suppose I could analyze times during the day where the service is utilized heavily and schedule uptime for the worker instance around that, but it's usage is fairly unpredictable and sporadic.mikegreiling

2 Answers

4
votes

There's another option that you may not be aware of. I actually just stumbled upon it: http://multyvac.com

I have no experience using it (so I can't vouch for it), but it looks like the first solution I've seen that actually offers true "utility computing". It began with just Python but now supports any language.

4
votes

I wish there was some service where I could pay for a highly-available on-demand hardware on a minute to minute basis rather than hourly.

That service is AWS Lambda, which wasn't available when you asked the question:

Lambda runs your code on high-availability compute infrastructure and performs all the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling

Pricing:

You are charged based on the number of requests for your functions and the time your code executes

Duration is calculated from the time your code begins executing until it returns or otherwise terminates, rounded up to the nearest 100ms.

The Lambda free tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month.

You can also wrap a Lambda function with an HTTP endpoint, possibly removing this layer from your application:

You can invoke a Lambda function over HTTPS by defining a custom RESTful API using Amazon API Gateway. This gives you an endpoint for your function which can respond to REST calls like GET, PUT and POST. Read more about using AWS Lambda with Amazon API Gateway.

Caveat: Lambda currently supports only JavaScript, Java, and Python, so I'm not sure how you would get R to work. You may need to host R in one of these runtimes.