0
votes

I spent the last few days on creating a parallel version of a code (college work), but I came to a dead end (at least for me): The parallel version is nearly as twice slower than the sequential one, and I have no clue on why. Here is the code:

Variables.GetMatrix();
int ThreadNumber = Environment.ProcessorCount/2;
int SS = Variables.PopSize / ThreadNumber;
//GeneticAlgorithm GA = new GeneticAlgorithm();
Stopwatch stopwatch = new Stopwatch(), st = new Stopwatch(), st1 = new Stopwatch();
List<Thread> ThreadList = new List<Thread>();
//List<Task> TaskList = new List<Task>();
GeneticAlgorithm[] SubPop = new GeneticAlgorithm[ThreadNumber];
Thread t;
//Task t;
ThreadVariables Instance = new ThreadVariables();

stopwatch.Start();
st.Start();
PopSettings();
InitialPopulation();
st.Stop();

//Lots of attributions...
int SPos = 0, EPos = SS;

for (int i = 0; i < ThreadNumber; i++)
{
    int temp = i, StartPos = SPos, EndPos = EPos;
    t = new Thread(() =>
    {
        SubPop[temp] = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, StartPos, EndPos);
        SubPop[temp].RunGA();
        SubPop[temp].ShowPopulation();
    });
    t.Start();
    ThreadList.Add(t);
    SPos = EPos;
    EPos += SS;
}

foreach (Thread a in ThreadList)
    a.Join();

double BestFit = SubPop[0].BestSol;
string BestAlign = SubPop[0].TV.Debug;

for (int i = 1; i < ThreadNumber; i++)
{
    if (BestFit < SubPop[i].BestSol)
    {
        BestFit = SubPop[i].BestSol;
        BestAlign = SubPop[i].TV.Debug;
        Variables.ResSave = SubPop[i].TV.ResSave;
        Variables.NumSeq = SubPop[i].TV.NumSeq;
    }
}

Basically the code creates an array of the object type, instantiante and run the algorithm in each position of the array, and collecting the best value of the object array at the end. This type of algorithm works on a three-dimentional data array, and on the parallel version I assign each thread to process one range of the array, avoiding concurrency on data. Still, I'm getting the slow timing... Any ideas?

I'm using an Core i5, which has four cores (two + two hyperthreading), but any amount of threads greater than one I use makes the code run slower.

What I can explain of the code I'm running in parallel is:

The second method being called in the code I posted makes about 10,000 iterations, and in each iteration it calls one function. This function may or may not call others more (spread across two different objects for each thread) and make lots of calculations, it depends on a bunch of factors which are particular of the algorithm. And all these methods for one thread work in an area of a data array that isn't accessed by the other threads.

3
How does a Parallel.ForEach compare? - Tim S.
If it's a fast algorithm, threading overhead could easily nullify any benefit of parallelizing. - Kendall Frey
Check out this documentation for plinq it also has information on what might cause parallel operations to be slower. - Sam I am says Reinstate Monica
in a very quick view of your code...a.Join() maybe the cause of you problems - apomene
You're not showing us the actual code that you're trying to run in parallel - you're showing us the code that sets up the parallelism and then aggregates the results. If there's a bottleneck, it's highly likely to be in the code that you're running in parallel. - Damien_The_Unbeliever

3 Answers

1
votes

With System.Linq there is a lot to make simpler:

int ThreadNumber = Environment.ProcessorCount/2;
int SS = Variables.PopSize / ThreadNumber;
int numberOfTotalIterations = // I don't know what goes here.

var doneAlgorithms = Enumerable.Range(0, numberOfTotalIterations)
                               .AsParallel() // Makes the whole thing running in parallel
                               .WithDegreeOfParallelism(ThreadNumber) // We don't need this line if you want the system to manage the number of parallel processings.
                               .Select(index=> _runAlgorithmAndReturn(index,SS))
                               .ToArray(); // This is obsolete if you only need the collection of doneAlgorithms to determine the best one.
                                           // If not, keep it to prevent multiple enumerations.

// So we sort algorithms by BestSol ascending and take the first one to determine the "best".
// OrderBy causes a full enumeration, hence the above mentioned obsoletion of the ToArray() statement.
GeneticAlgorithm best = doneAlgorithms.OrderBy(algo => algo.BestSol).First();

BestFit = best.Bestsol;
BestAlign = best.TV.Debug;
Variables.ResSave = best.TV.ResSave;
Variables.NumSeq = best.TV.NumSeq;

And declare a method to make it a bit more readable

/// <summary>
/// Runs a single algorithm and returns it
/// </summary>
private GeneticAlgorithm _runAlgorithmAndReturn(int index, int SS)
{
    int startPos = index * SS;
    int endPos = startPos + SS;
    var algo = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, startPos, endPos);
    algo.RunGA();
    algo.ShowPopulation();
    return algo;
}
0
votes

There is a big overhead in creating threads.

Instead of creating new threads, use the ThreadPool, as show below:

Variables.GetMatrix();
int ThreadNumber = Environment.ProcessorCount / 2;
int SS = Variables.PopSize / ThreadNumber;
//GeneticAlgorithm GA = new GeneticAlgorithm();
Stopwatch stopwatch = new Stopwatch(), st = new Stopwatch(), st1 = new Stopwatch();
List<WaitHandle> WaitList = new List<WaitHandle>();
//List<Task> TaskList = new List<Task>();
GeneticAlgorithm[] SubPop = new GeneticAlgorithm[ThreadNumber];
//Task t;
ThreadVariables Instance = new ThreadVariables();

stopwatch.Start();
st.Start();
PopSettings();
InitialPopulation();
st.Stop();
//lots of attributions...
int SPos = 0, EPos = SS;

for (int i = 0; i < ThreadNumber; i++)
{
    int temp = i, StartPos = SPos, EndPos = EPos;
    ManualResetEvent wg = new ManualResetEvent(false);
    WaitList.Add(wg);
    ThreadPool.QueueUserWorkItem((unused) =>
    {
        SubPop[temp] = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, StartPos, EndPos);
        SubPop[temp].RunGA();
        SubPop[temp].ShowPopulation();
        wg.Set();
    });

    SPos = EPos;
    EPos += SS;
}

ManualResetEvent.WaitAll(WaitList.ToArray());

double BestFit = SubPop[0].BestSol;
string BestAlign = SubPop[0].TV.Debug;

for (int i = 1; i < ThreadNumber; i++)
{
    if (BestFit < SubPop[i].BestSol)
    {
        BestFit = SubPop[i].BestSol;
        BestAlign = SubPop[i].TV.Debug;
        Variables.ResSave = SubPop[i].TV.ResSave;
        Variables.NumSeq = SubPop[i].TV.NumSeq;
    }
}

Note that instead of using Join to wait the thread execution, I'm using WaitHandles.

0
votes

You're creating the threads yourself, so there's some extreme overhead there. Parallelise like the comments suggested. Also make sure the time a single work-unit takes is long enough. A single thread/workunit should be alive for at least ~20 ms.

Pretty basic things really. I'd suggest you really read up on how multi-threading in .NET works.

I see you don't create too many threads. But the optimal threadcount can't be determined just from the processor count. The built-in Parallel class has advanced algorithms to reduce the overall time.

Partitioning and threading are some pretty complex things that require a lot knowledge to get right, so unless you REALLY know what you're doing rely on the Parallel class to handle it for you.