0
votes

I have a table(table1) which has id as the primary key and incremental in nature. The table has Updatedtm which has the last updated date and time. The table has around 300 million records. I have another table table2 which is in sync with table1. What is the best way to delete from table2 when an id is removed from table1? Is Left join an efficient way, do i have to compare 300 million records everytime to check for deletes?

Delete from table2
where id not in(Select id from table1)
2
An outer join or not exists would almost always be better than a (not) IN. That said I recommend you test, then you will know. - Igor
What do you mean that table2 is in sync with table1? If there are records in table2 that do not have a corresponding entry in table1, it sounds like these tables are not in sync. Why is there not a foreign key relationship between these tables? - eaolson

2 Answers

0
votes

Make sure that you have an index that covers this query. I.e. make sure that table1.id and table2.id are indexed (yes, having this index will speed up the delete even thought the indexes will need to be updated.) This will help with the JOIN.

Also, you might want to look into batching your deletes,

WHILE <some_condition> BEGIN
    DELETE TOP (1000) t2
    FROM table2 t2
    LEFT OUTER JOIN table1 t1 ON t2.id = t1.id
    WHERE t1.id IS NULL
END

Batching your deletes will reduce the number of locks that SQL server will have to take out on your table, clustered, and non-clustered indexes. If this is a production server with 300-million rows, I would definitely look at your indexes and count the number of records that you think you might be deleting before coming up with a deletion strategy.

SELECT COUNT(*) FROM table2 t2
LEFT OUTER JOIN table1 t1 ON t2.id = t1.id
WHERE t1.id IS NULL

Also, contact any server admins to see what they think about potential locking issues.

0
votes

I would suggest you to use "Exists" of instead of "Not in"

Delete from table2 t2
where exists ( select 1 from table1 t1 where t1.id=t2.id)