I'm trying to import around 900K rows of data and map it to my new data model. My problem is that the console application I have build for this import functionality slows down over time.
I have monitored the SQL queries and they all perform well (<5ms). I tried to import smaller chunks, fx 1K rows, at a time. The stopwatch timings looks like this:
- Count: 100 | Avg ms: 36
- Count: 200 | Avg ms: 67
- Count: 300 | Avg ms: 106
- Count: 400 | Avg ms: 145
- Count: 500 | Avg ms: 183
- Count: 600 | Avg ms: 222
- Count: 700 | Avg ms: 258
- Count: 800 | Avg ms: 299
- Count: 900 | Avg ms: 344
- Count: 1000 | Avg ms: 376
When restarting the application with a new chunk of 1K rows, the timings are similar.
The import data has the following format:
public class ImportData
{
public int Id { get; set; }
public int TaxpayerId { get; set; }
public string CustomerName { get; set; }
public string Email { get; set; }
public string PhoneNumber { get;set; }
}
A simplified example of my data model looks like this:
public class Channel
{
public int Id { get; set; }
public string Name { get; set; }
}
public class Permission
{
public Guid Id { get; set; }
public Channel Channel { get; set; }
public string Recipient { get; set; }
}
public class Taxpayer
{
public Guid Id { get; set; }
public int TaxpayerId { get; set; }
public string Name { get; set; }
public List<Permission> Permissions { get; set; }
}
My import method looks like this:
public void Import()
{
Stopwatch stopwatch = new Stopwatch();
//Get import data
List<ImportData> importDataList = _dal.GetImportData();
stopwatch.Start();
for (int i = 0; i < importDataList.Count; i++)
{
ImportData importData = importDataList[i];
Taxpayer taxpayer = new Taxpayer()
{
Name = importData.CustomerName,
TaxpayerId = importData.TaxpayerId,
Permissions = new List<Permission>()
};
//Does not call SaveChanges on the context
CreateTaxpayer(taxpayer, false);
//Create permissions
if (!string.IsNullOrWhiteSpace(importData.Email))
{
//Does not call SaveChanges on the context
CreatePermission(_channelIdEmail, importData.Email, taxpayer, PermissionLogType.PermissionRequestAccepted);
}
if (!string.IsNullOrWhiteSpace(importData.PhoneNumber))
{
//Does not call SaveChanges on the context
CreatePermission(_channelIdPhoneCall, importData.PhoneNumber, taxpayer, PermissionLogType.PermissionRequestAccepted);
//Does not call SaveChanges on the context
CreatePermission(_channelIdSms, importData.PhoneNumber, taxpayer, PermissionLogType.PermissionRequestAccepted);
}
if ((i + 1) % 100 == 0)
{
Console.WriteLine("Count: " + (i + 1) + " | Avg ms: " + stopwatch.ElapsedMilliseconds / 100);
stopwatch.Restart();
}
}
_dal.SaveChanges();
}
I tried the following:
- reduce the number of calls to SaveChanges (only called at once at the end)
- implement multithreading (without luck) - it does not seem to go hand in hand with entity framework
I am running out of ideas here. Do you guys have any suggestions to solve this performance issue?
BULK INSERTover a stream of rows is far faster, it also uses minimal logging. The transaction log doesn't record every single INSERT statement. It only logs enough data to rollback if needed. This results in a LOT less disk IO, log usage and fragmentation - Panagiotis Kanavos