1
votes

A bug has arisen in my code which has allowed duplicate entries to appear in a database table, but with different casing. I am using MySQL.

e.g In this example, the second row shouldn't have been added, but it was.

    ID    Name          Description
-------------------------------------
    1    HELLO WORLD    Saying hello
    2    Hello world    Saying hello

To remedy the incorrect data, I want to delete the row where the full name is capitalised whereever a duplicate has occured. Can I do regex in SQL, is that the best solution, and how would I do it?

The table, in full, is as follows

'ADDRESSBOOK_GROUP', 'CREATE TABLE ADDRESSBOOK_GROUP ( ID int(10) unsigned NOT NULL default ''0'', NAME varchar(255) collate utf8_unicode_ci NOT NULL, DESCRIPTION text collate utf8_unicode_ci, ADDRESSBOOK_ID int(10) default NULL, TYPE int(10) unsigned NOT NULL default ''1'', PRIMARY KEY (ID), KEY ADDRESSBOOK_ID (ADDRESSBOOK_ID) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=DYNAMIC'

Edit - Also I forget to mention earlier that I only want the duplicate checks to occur when ADDRESSBOOK_ID is '6';

1
Which RDBMS? And show the table, too. - fancyPants
I've provided a simplified example of the table. - Andy A
So far I've looked at working out the SQL, but I'm unsure on the WHERE statement. I've no regex experience and have struggled to get a foothold in it. I presume its a fairly simple piece of regex once I know what I'm doing though (unless I don't need regex at all?). - Andy A

1 Answers

5
votes

the following selects all the upper case values where doubles exists

delete t.* from table_name t
inner join (
    select distinct t.id
    from table_name t
    left join table_name t2
        on t2.col_name = t.col_name
    where t.col_name LIKE BINARY UPPER(t.col_name)
) as a
    on a.id = t.id;

edit:

delete t.* from ADDRESSBOOK_GROUP t
inner join (
    select distinct t.ID
    from ADDRESSBOOK_GROUP t
    left join (
        select
            count(*) as n,
            NAME,
            ADDRESSBOOK_ID
        from ADDRESSBOOK_GROUP
        group by 
            NAME,
            ADDRESSBOOK_ID
    ) as t2
        on t2.name = t.name
        and t2.ADDRESSBOOK_ID = t.ADDRESSBOOK_ID
    left join ADDRESSBOOK_GROUP t_n
        on t_n.ID = t.ID
        and t_n.NAME REGEXP ('^[0-9]+$') 
    where 
        t.NAME like binary upper(t.NAME)
        and t_n.ID is null
        and t2.n > 1
) as a
    on a.ID = t.ID
where t.ADDRESSBOOK_ID = 6;