0
votes

I have following table structure:

Users

  • UserId (primary key)
  • UserName

SomeItems

  • SomeId(FK to Users.UserId)
  • SomeItemId (primary key)
  • Name
  • Other stuff...

SomeOtherItems

  • SomeId2 (FK to Users.UserId)
  • SomeOtherItemId (primary key)
  • Name
  • Other stuff...

I want to delete records from Users table which do not exist in SomeItems and SomeOtherItems tables.

I can do this:

DELETE from Users
FROM Users u
WHERE u.UserId NOT IN
   (SELECT DISTINCT SomeId FROM SomeItems

    UNION

    SELECT DISTINCT SomeId2 FROM SomeOtherItems)

However, it is very slow. I assume it executes the UNION query for every record, doesn't it? Is there any way to improve the performance?

3
How many rows are there in those three tables? - Salman A

3 Answers

6
votes

Applying distinct on two results and then unioning them (which applies yet a third distinct, all requiring sorts) is not the most efficient way to validate existence. How about:

DELETE u
  FROM dbo.Users AS u -- always use schema prefix!
  WHERE NOT EXISTS
  (
    SELECT 1 FROM dbo.SomeItems WHERE SomeId = u.UserId
  )
  AND NOT EXISTS
  (
    SELECT 1 FROM dbo.SomeOtherItems WHERE SomeID2 = u.UserId
  );
2
votes

the simplest fix might just be to change UNION to UNION ALL

you'll see the effect of this by the removal of the stream aggregate component of the query plan.

after all you don't care if the list is duplicated

1
votes

I would change the IN for two joins, the simpler the query, the easier is to optimize for the engine.

DELETE from U
FROM Users U
     left join SomeItems S1 on S1.SomeId = U.UserId
     left join SomeOtherItems S2 on S2.SomeID2 = U.UserId
WHERE S1.SomeId is null and S2.SomeID2 is null

Checking that S1.SomeID is null means that U.UserId was not present on SomeItems. Same for SomeOtherItems.

Be sure that you have indexes for SomeId and SomeID2 on SomeItems and SomeOtherItems.