0
votes

I have following DataTable

DataTable dt = new dataTable();

I have got this dataTable filled from some another method. It will have 50000 rows and 40 columns before executing following statements.

Number of rows and columns may vary. hence I have not defined specific set of columns to the dataTable.

I want to add two columns at the end (guid and addeddate) and want to add same value in all 50K rows for those 2 columns.

I have written simple foreach loop for this. Is there any way that I can do it Parallely?

I tried using Parallel.Foreach but didnt get any success.

//by this time my dt will have 50000 rows and 40 columns

dt.Columns.Add(new DataColumn("guid", typeof(string)));
dt.Columns.Add(new DataColumn("addeddate", typeof(DateTime)));
string sessionIDValue = Convert.ToString(Guid.NewGuid());
DateTime todayDt = DateTime.Today;                   


foreach (DataRow row in dt.Rows)
{
    row["guid"] = sessionIDValue;
    row["addeddate"] = todayDt;     
}
2
Did you already try specifying the DefaultValue for the DataColumns you are adding? I wouldn't dare to predict how well this performs (if it even works). - Filburt
@Filburt yes, but as the structure of DataTable is not fixed, we cant provide that - captainsac

2 Answers

2
votes

You need to access the rows using explicit indexes, and the row index would be a perfect way to do this.

You should be able to create an array equal to the number of rows you have (e.g. 50000) with an index for each row as the values of that array (e.g. 0..1..2..3.. and so on), and then use a parallel loop on that array of indexes whereby you're passing in the explicit row index to your dt.Rows object.

The gist of the code would be:

// Pseudo code

// Create array equal to size of the # of rows (int ArrayOfIndexes[])
// Fill that array with values representing row indexes starting at 0

Parallel.ForEach(ArrayOfIndexes, (index) => 
{
    lock(dt)
    {
        dt.Rows[index]["guid"] = sessionIDValue;
        dt.Rows[index]["addeddate"] = todayDt;
    }
}

EDIT: I did find out that since DataTable isn't thread-safe, that you have to include the lock around your assignments, which obviously yields a performance hit, but should still be faster than simply looping through without the Parallel.ForEach.

0
votes

An upgrade of @Shane Oborn 's answer without extra variable of ArrayOfIndexes and uses separate lock object.

I would use:

var lockObj = new object();
Parallel.Foreach(dt.AsEnumerable(), row =>
{
    lock(lockObj)
    {
        row["guid"] = sessionIDValue;
        row["addeddate"] = todayDt;     
    }
});

You have to add using statements:

using System.Data.DataSetExtensions;
using System.Linq;
using System.Xml;