4
votes

I have a set of records (table [#tmp_origin]) containing duplicate entries in a string field ([Names]). I would like to insert the whole content of [#tmp_origin] into the destination table [#tmp_destination], that does NOT allow duplicates and may already contain items.

If the string in the origin table does not exist in the destination table, then in is simply inserted in the destination table, as is. If an entry in the destination table already exists with the same value of the entry in the original table, a string-ified incremental number must be appended to the string, before it is inserted in the destination table.

The process of moving data in this way has been implemented with a cursor, in this sample script:


-- create initial situation (origin and destination table, both containing items) - Begin

    CREATE TABLE [#tmp_origin] ([Names] VARCHAR(10))
    CREATE TABLE [#tmp_destination] ([Names] VARCHAR(10))
    CREATE UNIQUE INDEX [IX_UniqueName] ON [#tmp_destination]([Names] ASC)



    INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
    INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
    INSERT INTO [#tmp_origin]([Names]) VALUES ('b')
    INSERT INTO [#tmp_origin]([Names]) VALUES ('c')


    INSERT INTO [#tmp_destination]([Names]) VALUES ('a')
    INSERT INTO [#tmp_destination]([Names]) VALUES ('a_1')
    INSERT INTO [#tmp_destination]([Names]) VALUES ('b')

-- create initial situation - End

    DECLARE @Name VARCHAR(10)

    DECLARE NamesCursor CURSOR LOCAL FORWARD_ONLY FAST_FORWARD READ_ONLY FOR
        SELECT [Names]
        FROM [#tmp_origin];
    OPEN NamesCursor;
    FETCH NEXT FROM NamesCursor INTO @Name;

    WHILE @@FETCH_STATUS = 0
    BEGIN
        DECLARE @finalName VARCHAR(10)
        SET @finalName = @Name
        DECLARE @counter INT
        SET @counter = 1

        WHILE(1=1)
        BEGIN
            IF NOT EXISTS(SELECT * FROM [#tmp_destination] WHERE [Names] = @finalName)
                BREAK;

            SET @finalName = @Name + '_' + CAST(@counter AS VARCHAR)
            SET @counter = @counter + 1
        END
        INSERT INTO [#tmp_destination] ([Names]) (
            SELECT @finalName
        )

        FETCH NEXT FROM NamesCursor INTO @Name;
    END

    CLOSE NamesCursor;
    DEALLOCATE NamesCursor;




    SELECT *
    FROM [#tmp_destination]

    /*
    Expected result:
    a
    a_1
    a_2
    a_3
    b
    b_1
    c
    */

    DROP TABLE [#tmp_origin]
    DROP TABLE [#tmp_destination]


This works correctly, but its performance drastically slows down when the number of items to insert increases.

Any idea to speed it up?

thanks

3
Sample data and desired results would really help. A cursor is rarely the best way to process data.Gordon Linoff

3 Answers

5
votes

Using a windowing function allows the duplicates to be numbered. You can also get the count from the destination table (will need where condition to strip off the suffix you've added):

select orig.names,
       row_number() over (partition by orig.names order by orig.names) as rowNo,
       dest.count
from ##tmp_origin orig
  cross apply (select count(1) from #tmp_destination where names = orig.names) as dest

An insert can be built from the above (new suffix is rowNo + dest.count -1 if greater than zero).

Suggest you refactor the destination temporary table to include the name and suffix as separate columns – this might mean having a new intermediate stage – because this will make the matching logic much simpler.

1
votes

Something like this:

insert  [#tmp_destination]
select  CASE WHEN row_number() over(partition by Names order by Names) > 1 THEN Names + '_' + CONVERT(VARCHAR(10), row_number() over(partition by Names order by Names)) ELSE Names END
from    [#tmp_origin]
1
votes

I wouldn't use a cursor in that case. Instead, I would build the query using ROW_NUMBER(). This way you add a counter in your original table, and then use this counter to append to your [Names]:

SELECT [Names], ROW_NUMBER() OVER (PARTITION BY [Names] ORDER BY [Names]) - 1 AS [counter]
INTO #tmp_origin_with_counter
FROM #tmp_origin

SELECT CONCAT([Names], IIF([counter] = 0, '', '_'+ CAST([counter] AS NVARCHAR)))
INTO #tmp_destination
FROM #tmp_origin_with_counter