My story
I am quite a beginner in parallel programming (I didn't ever do anything more than writing some basic multithreaded things) and I need to parallelize some multithreaded java-code in order to make it run faster. The multithreaded algorithm simply generates threads and passes them to the operating system which does the distribution of threads for me. The results of every thread can be gathered by some collector that also handles synchronisation issues with semaphores etc and calculates the sum of the results of all different threads. The multithreaded code kinda looks like this:
public static void main(String[] args) {
int numberOfProcesses = Integer.parseInt(args[0]);
...
Collector collector = new Collector(numberOfProcesses);
while(iterator.hasNext()) {
Object x = iterator.next();
new OverwrittenThread(x, collector, otherParameters).start();
}
if(collector.isReady())
System.out.prinltn(collector.getResult());
}
My first idea to convert this to MPI, was the basic way (I guess) to just split up the loop and give every iteration of this loop to another processor like this (with mpiJava):
public static void main(String[args]) {
...
Object[] foo = new Object[number];
int i = 0;
while(iterator.hasNext())
foo[i++] = iterator.next();
...
int myRank = MPI.COMM_WORLD.Rank();
for(int i = myRank; i < numberOfElementsFromIterator; i += myRank) {
//Perform code from OverwrittenThread on foo[i]
}
MPI.COMM_WORLD.Reduce(..., MPI.SUM, ...);
}
The problems
This is, till now, the only way that I, as newbie in mpi, could make things work. This is only an idea, because I have no idea how to tackle implementation-problems like conversion of BigIntegers to MPI datatypes, etc. (But I would get this far, I guess)
The real problem though, is the fact that this approach of solving the problem, leaves the distribution of work very unbalanced because it doesn't take into account how much work a certain iteration takes. This might really cause some troubles as some iterations can be finished in less than a second and others might need several minutes.
My question
Is there a way to get a similar approach like the multithreaded version in an MPI-implementation? At first I thought it would just be a lot of non-blocking point-to-point communication, but I don't see a way to make it work that way. I also considered using the scatter-functionality, but I have too much troubles understanding how to use it correctly.
Could anybody help me to clear this out, please?
(I do understand basic C etc)
Thanks in advance