This pertains to a project I am inheriting and cannot change table structure or data access model. I have been asked to optimize the algorithm being used to insert data into the database.
We have a dataset in table T. From that, we pull a set we will call A. We also query an XML feed and get a set we will call X.
- If a value from X is in A, record in A must be updated to reflect data for X.record
- If a value from X is not in A, X.record must be inserted into A
- If a value from A is not in X, A.record must be preserved in A
- X must be fully iterated through for all records, and A must be updated
All these changes need to be insert back into the database.
The algo as set up does the following:
Query XML into a LIST
foreach over the XML LIST
look up foreach.item in A via LINQ (i.e. query = from record in A where
record.GUID == foreach.item.GUID
select record)
if query.Count() == 0
insert into A (via context.AddToTableName(newTableNameObject)
else
var currentRecord = query.First()
set all properties on currentRecord = properties from foreach.item
context.SaveChanges()
I know this to be suboptimal. I tried to get A into a object (call it queryA) outside of the foreach loop in an effort to move the query to memory and not hitting the disk, but after thinking that through, I realized the database was already in memory.
Having added timer objects into the algo, it's clear that what is costing the most time is the SaveChanges() function call. In some cases it's 20ms, and in some others, in explicably, it will jump to 100ms.
I would prefer to only call the SaveChanges() one time. I can't figure out how to do that given my depth of knowledge of EF (which is thin at best) and the constraints of not being able to change the table structures and having to preserve data from A which is not in X.
Suggestions?
MERGE
within the target database? – user565869