0
votes

I have a table that contains duplicate names and emails but with different identifiers and other associated data. In this case, the identifier does not have to be unique. Here is an example of the data:

course user_code First Name Last Name Email Address
econ SKNKZ62HR28 John Doe [email protected]
lang C6NV4LQS5D4 John Doe [email protected]
science H9NNVNWVPM9 Fred Doe [email protected]
govt JVNH2BVTD3V Jane Doe [email protected]
math L2NK8S49N5G Jane Doe [email protected]

What I need to do is update the user_code so that John Doe has the same user_code for both courses:

course user_code First Name Last Name Email Address
econ SKNKZ62HR28 John Doe [email protected]
lang SKNKZ62HR28 John Doe [email protected]
science H9NNVNWVPM9 Fred Doe [email protected]
govt L2NK8S49N5G Jane Doe [email protected]
math L2NK8S49N5G Jane Doe [email protected]

I've been able to identify my duplicates with this code:

SELECT a.*
FROM attendees a
JOIN 
    (SELECT [Email Address], [Last Name], [First Name], COUNT(*) AS number
     FROM attendees 
     GROUP BY [Email Address], [Last Name], [First Name]
     HAVING count(*) > 1) b ON a.[Email Address] = b.[Email Address]
                            AND a.[Last Name] = b.[Last Name]
                            AND a.[First Name] = b.[First Name]
ORDER BY a.[Email Address]

I'm not sure of the next steps.

1
Thanks. Updated tag. SQL-Server - bobz

1 Answers

1
votes

If you just want a query to return a result set, you can use:

select a.*,
       max(user_code) over (partition by firstname, lastname, emailaddress) as imputed_user_code
from attendees a;

The best way to do an update depends on the database. The square braces suggest SQL Server, which has updatable CTEs. That allows:

with toupdate as (
      select a.*,
             max(user_code) over (partition by firstname, lastname, emailaddress) as imputed_user_code
      from attendees a
     )
update toupdate
     set user_code = imputed_user_code
     where user_code <> imputed_user_code;