DELETE and NOT IN query performance

Question

I have following table structure:

Users

UserId (primary key)
UserName

SomeItems

SomeId(FK to Users.UserId)
SomeItemId (primary key)
Name
Other stuff...

SomeOtherItems

SomeId2 (FK to Users.UserId)
SomeOtherItemId (primary key)
Name
Other stuff...

I want to delete records from Users table which do not exist in SomeItems and SomeOtherItems tables.

I can do this:

DELETE from Users
FROM Users u
WHERE u.UserId NOT IN
   (SELECT DISTINCT SomeId FROM SomeItems

    UNION

    SELECT DISTINCT SomeId2 FROM SomeOtherItems)

However, it is very slow. I assume it executes the UNION query for every record, doesn't it? Is there any way to improve the performance?

sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join — Aaron Bertrand

Aaron Bertrand Aaron Bertrand · Accepted Answer · 2019-04-16T13:49:08

Applying distinct on two results and then unioning them (which applies yet a third distinct, all requiring sorts) is not the most efficient way to validate existence. How about:

DELETE u
  FROM dbo.Users AS u -- always use schema prefix!
  WHERE NOT EXISTS
  (
    SELECT 1 FROM dbo.SomeItems WHERE SomeId = u.UserId
  )
  AND NOT EXISTS
  (
    SELECT 1 FROM dbo.SomeOtherItems WHERE SomeID2 = u.UserId
  );

DELETE and NOT IN query performance

Users

SomeItems

SomeOtherItems

3 Answers