Version 1
INSERT INTO table_a (col_1, col_2)
SELECT DISTINCT col_1, col_2
FROM table_b b
WHERE b.col_1 IS NOT NULL
AND b.col_2 IS NOT NULL
AND b.id NOT IN
(
SELECT b.id
FROM table_b b
JOIN table_a a WITH(NOLOCK)
ON b.col_1 = a.col_1
AND b.col_2 = a.col_2.
)
Version 2
SELECT * INTO #temp FROM
(
SELECT DISTINCT col_1, col_2
FROM table_b
WHERE b.col_1 IS NOT NULL
AND b.col_2 IS NOT NULL
AND b.id NOT IN
(
SELECT b.id
FROM table_b b
JOIN table_a a WITH(NOLOCK)
ON b.col_1 = a.col_1
AND b.col_2 = a.col_2
)) t
INSERT INTO table_a (col_1, col_2) SELECT col_1, col_2 FROM #temp
col_1: varchar col_2: varchar
table_a: clustered on primary key sequential id, non-clustered unique on col_1 include col_2, non-clustered unique on col_2 include col_1
Version 1 I/O
Table 'table_b'. Scan count 34, logical reads 118, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'table_a'. Scan count 0, logical reads 109404, physical reads 8761, read-ahead reads 7761, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(4997 row(s) affected)
Version 2 I/O
Table 'table_b'. Scan count 34, logical reads 118, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'table_a'. Scan count 0, logical reads 35454, physical reads 0, read-ahead reads 5435, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#temp_______________________________________________________________________________________________________________00000044D848'. Scan count 0, logical reads 5045, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(4994 row(s) affected)
(1 row(s) affected) Table 'table_a'. Scan count 0, logical reads 105486, physical reads 331, read-ahead reads 5940, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 2, logical reads 10412, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#temp_______________________________________________________________________________________________________________00000044D848'. Scan count 1, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(4994 row(s) affected)
Version 1 is significantly slow that version 2, I assume due to the higher physical read count.
Are there any obvious reasons why the physical read count would be so different, superficially it appears as though both versions should be doing the same work.
I have ran these numerous time with hot and cold and cache and seen consistent behavour across both versions.