Update table with different values from a different table which is grouped

Question

I have a oracle table called table1 that has a foreign key from table2. table2 has 2 columns: id, name table2.name has a lot of duplicates that need to be sorted out, so I grouped table2 by name and kept only 1 id for each name. However, table1.table2_id uses a lot of the duplicated fields.

How can I change all the table1.table2_id fields so that there are no duplicate names?

Currently: table1:

id	blabla	table2_id
1	row	1001
2	row	1002
3	row	1003
4	row	1004
5	row	1004
6	row	1005

table2:

id	name
1001	Bob
1002	Bob
1003	Bob
1004	Jack
1005	Jack

Desired Result: table1:

id	blabla	table2_id
1	row	1001
2	row	1001
3	row	1001
4	row	1004
5	row	1004
6	row	1004

table2:

id	name
1001	Bob
1004	Jack

So imo, I would need to:

update all table1.table2_id to top 1 table2.id grouped by name
delete duplicate rows from table2

And as there are thousands of duplicate fields in table2, I cannot use case for this..

The solution I am trying returns 1 id for each name, but I am failing at the part where I can update the table.

update table1 d
set d.table2_id  = 
(select
b.id
from table2 b
where b.name in (select min(id) keep(dense_rank first order by id) id
     , name
  from table2
 group by name)
order by b.id ASC
)
where d.table2_id in ( SELECT b.id FROM table2 b WHERE b.name in (select min(id) keep(dense_rank first order by id) id
     , name
  from table2
 group by name));

Please help :)

Judging from the keep(...) structure, you are using Oracle. If so, pls only tag your question with Oracle, not with several random database products! — Shadow

Petr Petr · Accepted Answer · 2021-01-26T19:43:41

Here are syntactically correct Oracle statements

update table1 t1
set t1.table2_id = (
  select new_id
  from (
    select id, min(id) over (partition by name) new_id
    from table2
  ) d
  where d.id = t1.table2_id
);

delete from table2
where id not in (
  select min(id) from table2 group by name
);

The analytic function min(id) over (partition by name) is used here so that you can have all original ids together with their new ids (the min ids from the set where the name is the same).

Update table with different values from a different table which is grouped

2 Answers