I'm researching ways to distribute tasks across machines or a cluster. MPI seems to be the traditional way to do this, but the way it works also seems archaic: you write a single program that does just the task, detect in the code which node (process) you're running on, then send data to other processes to do the calculations. While there is plenty of information to be found on how to use the MPI API, I can't find a comprehensive description of how tasks are started on other machines. Reading between the lines, it seems that the 'task manager' (mpirun or mpiexec or similar) copies the whole executable (in a primitive way, just an scp or so) to the other machine, then runs it there. Sharing data then happens through network shares, nfs or cifs or similar.
So I'm wondering if the specification also covers things like dependent libraries (what if my application depends on a shared library and the compute nodes don't have it?), cross-platform communication (the spec claims to be 'platform neutral' which is true I guess if you're talking about being able to be implemented on all platforms), but there doesn't seem to be a common or reliable way to start a task from a Windows machine on a Linux cluster, for example.
What I want to do is have a long-running program that occasionally runs CPU-intensive tasks on many machines in small bursts. I also don't want to have to code to one cluster - I'm looking for a generic solution that can be run on real, thousands-of-nodes Linux clusters as well as on small LAN 'clusters' of workstations in a lab that are idling at night.
Is that possible with MPI? Or maybe I should ask, are there implementations of MPI that support this? Or should I forget about MPI and just implement my own task-specific solution?