2
votes

Does SQL Server guarantee that NewSequentialId() will be called for each row in the order specified by the ORDER BY clause of an INSERT statement?

The goal is to take a list of objects in C#, each of which represents a row to be inserted in a table, and insert them into a table fairly quickly.

What I'm trying to do is insert the rows into a temporary table using SqlBulkCopy, then insert the rows from the temp table into a table that uses NewSequentialId(), then retrieve the new IDs in a way that they can be sorted in the same order as the list of objects in C#, so that the IDs can be attached to each corresponding object in C#.

I'm using SQL Server 2016 and this is the target table:

CREATE TABLE dbo.MyTable
(
    Id UNIQUEIDENTIFIER NOT NULL PRIMARY KEY DEFAULT NEWSEQUENTIALID(),
    SomeNonUniqueValue NVARCHAR(50) NOT NULL
)

First I use SqlBulkCopy to insert rows into this temp table. The RowOrder columns contains an integer generated in the application. RowOrder is the order I need the generated IDs to be returned in. In the application, RowOrder is the index of each C# object in the list.

CREATE TABLE #MyTableStaging
(
    RowOrder INT NOT NULL,
    SomeNonUniqueValue NVARCHAR(50) NOT NULL
)

Then I run this SQL to take the rows from #MyTableStaging, insert them into MyTable and retrieve the inserted IDs.

DECLARE @MyTableOutput TABLE
(
    Id UNIQUEIDENTIFIER NOT NULL
)

INSERT INTO dbo.MyTable (SomeNonUniqueValue)
OUTPUT Inserted.Id INTO @MyTableOutput(Id)
SELECT SomeNonUniqueValue 
FROM #MyTableStaging
ORDER BY RowOrder

SELECT Id FROM @MyTableOutput ORDER BY Id

In all my testing this works. However, I recently found out that the order in which rows are inserted into the table specified in the OUTPUT clause is not always the same as the order specified by the ORDER BY in the INSERT statement (I found this because the original design of this system was to use an identity in #MyTableStaging, and rather than ordering by #MyTableStaging.Id I was ordering by the identity column).

I know that SQL Server guarantees that identity values are generated in the order specified in the ORDER BY clause of an INSERT statement (from https://docs.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-2017#limitations-and-restrictions):

INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted.

1

1 Answers

1
votes

The easiest (and likely most efficient) way would be to insert into the target MyTable table directly without the intermediate staging table. I'd use a table-valued-parameter to pass the table of values into your stored procedure.

https://docs.microsoft.com/en-us/sql/relational-databases/tables/use-table-valued-parameters-database-engine?view=sql-server-2017

https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/table-valued-parameters


If you really want to use the staging table, you can't rely on the order of rows returned by the OUTPUT clause. You need to store the explicit mapping between the #MyTableStaging.RowOrder and the generated MyTable.Id. When you use OUTPUT clause in a simple INSERT statement you can't include columns from the source table into the output. There is a workaround. You can use MERGE instead of INSERT and OUTPUT clause of the MERGE statement allows columns from source table.

See very similar question Combine OUTPUT inserted.id with value from selected row

MERGE can INSERT, UPDATE and DELETE rows. In our case we need only to INSERT. 1=0 is always false, so the NOT MATCHED BY TARGET part is always executed. In general, there could be other branches, see docs. WHEN MATCHED is usually used to UPDATE; WHEN NOT MATCHED BY SOURCE is usually used to DELETE, but we don't need them here.

This convoluted form of MERGE is equivalent to simple INSERT, but unlike simple INSERT its OUTPUT clause allows to refer to the columns that we need. It allows to retrieve columns from both source and destination tables thus saving a mapping between old and new IDs.

DECLARE @MyTableOutput TABLE
(
    OldRowOrder int NOT NULL
    ,NewID UNIQUEIDENTIFIER NOT NULL
);


MERGE INTO dbo.MyTable
USING
(
    SELECT RowOrder, SomeNonUniqueValue
    FROM #MyTableStaging
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT (SomeNonUniqueValue)
VALUES (Src.SomeNonUniqueValue)
OUTPUT Src.RowOrder AS OldRowOrder, inserted.ID AS NewID
INTO @MyTableOutput(OldRowOrder, NewID)
;

If your DBA is so afraid of MERGE you don't have to use it. It will be less efficient, though.

Simply insert all rows.

INSERT INTO dbo.MyTable (SomeNonUniqueValue)
SELECT SomeNonUniqueValue 
FROM #MyTableStaging
;

We don't care about the order.

If SomeNonUniqueValue were unique, you could just join on this column to map RowOrder to Id. Since these values are not unique, we'll need an extra step and generate unique row numbers for joining.

WITH
CTE_Dst
AS
(

    SELECT
        Id
        ,SomeNonUniqueValue
        ,ROW_NUMBER() OVER (ORDER BY SomeNonUniqueValue) AS rn
    FROM dbo.MyTable
)
,CTE_Src
AS
(

    SELECT
        RowOrder
        ,SomeNonUniqueValue
        ,ROW_NUMBER() OVER (ORDER BY SomeNonUniqueValue) AS rn
    FROM #MyTableStaging
)
SELECT
    CTE_Dst.Id
    ,CTE_Src.RowOrder
FROM
    CTE_Dst
    INNER JOIN CTE_Src ON CTE_Src.rn = CTE_Dst.rn
;

If you have, say, three rows with the same SomeNonUniqueValue it doesn't really matter how you map these rows together, because SomeNonUniqueValue is the same.

Example:

#MyTableStaging
+----------+--------------------+
| RowOrder | SomeNonUniqueValue |
+----------+--------------------+
|        1 | qwerty             |
|        2 | qwerty             |
|        3 | qwerty             |
|        4 | asdf               |
|        5 | asdf               |
+----------+--------------------+

MyTable
+----+--------------------+
| ID | SomeNonUniqueValue |
+----+--------------------+
| A  | qwerty             |
| B  | qwerty             |
| C  | qwerty             |
| D  | asdf               |
| E  | asdf               |
+----+--------------------+

You can map them like this:

+----------+----+--------------------+
| RowOrder | ID | SomeNonUniqueValue |
+----------+----+--------------------+
|        1 | A  | qwerty             |
|        2 | B  | qwerty             |
|        3 | C  | qwerty             |
|        4 | D  | asdf               |
|        5 | E  | asdf               |
+----------+----+--------------------+

Or, you can map them like this:

+----------+----+--------------------+
| RowOrder | ID | SomeNonUniqueValue |
+----------+----+--------------------+
|        1 | B  | qwerty             |
|        2 | C  | qwerty             |
|        3 | A  | qwerty             |
|        4 | E  | asdf               |
|        5 | D  | asdf               |
+----------+----+--------------------+

It is still a valid mapping, because all three values of qwerty are the same. Neither of these mapping is "more correct" than another.

Obviously, if your MyTable wasn't empty before the INSERT, you need to select new rows only.