1
votes

I have written the below linq statement. But it takes huge time to process since there are so many lines. My cpu has 8 cores but only using 1 core due to running single thread.

So i wonder by any chance can this final stament run in multi threading ?

        List<string> lstAllLines = File.ReadAllLines("AllLines.txt").ToList();
        List<string> lstBannedWords = File.ReadAllLines("allBaddWords.txt").
Select(s => s.ToLowerInvariant()).
Distinct().ToList();

I am asking the one below. Can that line work multi threading ?

        List<string> lstFoundBannedWords = lstBannedWords.Where(s => lstAllLines.
SelectMany(ls => ls.ToLowerInvariant().Split(' ')).
Contains(s)).
        Distinct().ToList();

C# 5 , netframework 4.5

2
Looked into PLINQ? Note, this is in no way guaranteed to make anything run faster. - Adam Houldsworth
@AdamHouldsworth haven't checked yet but let me take a look :) - MonsterMMORPG
stackoverflow.com/questions/7582591/… (Understanding Speedup in PLINQ) The more expensive a query is, the better candidate it is for PLINQ. - Tim Schmelter
Wow it started using 5 cores with just adding 1 keyword hehe :D i love C# ^^ - MonsterMMORPG
@Chris .AsParallel() I think. - Adam Houldsworth

2 Answers

5
votes

The following snippet can perform that operation using the Parallel Tasks Library's Parallel.ForEach method. The snippet below takes each line in the 'all-lines' file you have, splits it on spaces, and then searches each line for banned words. The Parallel-ForEach should use all available core's on your machine's processor. Hope this helps.

System.Threading.Tasks.Parallel.ForEach(
    lstAllLines,
    line =>
    {
        var wordsInLine = line.ToLowerInvariant().Split(' ');
        var bannedWords = lstBannedWords.All(bannedWord => wordsInLine.Contains(bannedWord));
        // TODO: Add the banned word(s) in the line to a master list of banned words found.
    });
1
votes

There are rooms for performance improvements before resorting to AsParallel

HashSet<string> lstAllLines = new HashSet<string>(
                                File.ReadAllLines("AllLines.txt")
                                    .SelectMany(ls => ls.ToLowerInvariant().Split(' ')));

List<string> lstBannedWords = File.ReadAllLines("allBaddWords.txt")
                                    .Select(s => s.ToLowerInvariant())
                                    .Distinct().ToList();

List<string> lstFoundBannedWords = lstBannedWords.Where(s => lstAllLines.Contains(s))
                                    .Distinct().ToList();

Since access to HasSet is O(1) and lstBannedWords is the shorter list, You may even not need any parallelism (TotalSearchTime=lstBannedWords.Count*O(1)). Lastly, you always have the option AsParallel