12
votes

I am running the termstrc yield curve analysis package in R across 10 years of daily bond price data for 5 different countries. This is highly compute intensive, it takes 3200 seconds per country on a standard lapply, and if I use foreach and %dopar% (with doSNOW) on my 2009 i7 mac, using all 4 cores (8 with hyperthreading) I get this down to 850 seconds. I need to re-run this analysis every time I add a country (to compute inter-country spreads), and I have 19 countries to go, with many more credit yield curves to come in the future. The time taken is starting to look like a major issue. By the way, the termstrc analysis function in question is accessed in R but is written in C.

Now, we're a small company of 12 people (read limited budget), all equipped with 8GB ram, i7 PCs, of which at least half are used for mundane word processing / email / browsing style tasks, that is, using 5% maximum of their performance. They are all networked using gigabit (but not 10-gigabit) ethernet.

Could I cluster some of these underused PCs using MPI and run my R analysis across them? Would the network be affected? Each iteration of the yield curve analysis function takes about 1.2 seconds so I'm assuming that if the granularity of parallel processing is to pass a whole function iteration to each cluster node, 1.2 seconds should be quite large compared with the gigabit ethernet lag?

Can this be done? How? And what would the impact be on my co-workers. Can they continue to read their emails while I'm taxing their machines?

I note that Open MPI seems not to support Windows anymore, while MPICH seems to. Which would you use, if any?

Perhaps run an Ubuntu virtual machine on each PC?

3
Virtual machines are notorious memory hogs, not to mention that they are practically just a layer on top of another layer (think I/O flow through). Your coworkers won't thank you when they notice that 50% of their memory is being chunked out for something that you couldn't possible use efficiently - even if all they are doing is Word/email. Even Chrome can get up to 2gb nowadays on 64bit systems if you open enough windows.Brandon Bertelsen
Gotcha - though I doubt they would even notice to be honest. Just seems a waste to see 99% of CPU cycles idling when I have good use for them! BTW VM Ware Fusion on my Mac exacts about a 25% performance penalty versus "native" R (that is running the same routine on Win 64 in a VM, with 4 processors and 8 out of 16gb assigned) so it's not that bad, though I agree on the RAM.Thomas Browne
Did you find a working answer to your question? I'm working on the same problem here.jclouse

3 Answers

11
votes

Yes you can. There are a number of ways. One of the easiest is to use redis as a backend (as easy as calling sudo apt-get install redis-server on an Ubuntu machine; rumor has that you could have a redis backend on a windows machine too).

By using the doRedis package, you can very easily en-queue jobs on a task queue in redis, and then use one, two, ... idle workers to query the queue. Best of all, you can easily mix operating systems so yes, your co-workers' windows machines qualify. Moreover, you can use one, two, three, ... clients as you see fit and need and scale up or down. The queue does not know or care, it simply supplies jobs.

Bost of all, the vignette in the doRedis has working examples of a mix of Linux and Windows clients to make a bootstrapping example go faster.

6
votes

Perhaps not the answer you were looking for, but - this is one of those situations where an alternative is sooo much better that it's hard to ignore.

The cost of AWS clusters is ridiculously low (my emphasis) for exactly these types of computing problems. You pay only for what you use. I can guarantee you that you will save money (at the very least in opportunity costs) by not spending the time trying to convert 12 windows machines into a cluster. For your purposes, you could probably even do this for free. (IIRC, they still offer free computing time on clusters)

References:

Some of these instances are so powerful you probably wouldn't even need to figure out how to setup your work on a cluster (given your current description). As you can see from the references costs are ridiculously low, ranging from 1-4$ per hour of compute time.

1
votes

What about OpenCL?

This would require rewriting the C code, but would allow potentially large speedups. The GPU has immense computing power.