3
votes

I have a table with some names in a row. For each row I want to generate a random name. I wrote the following query to:

BEGIN transaction t1

Create table TestingName
(NameID int,
 FirstName varchar(100),
 LastName varchar(100)
)

INSERT INTO TestingName
SELECT 0,'SpongeBob','SquarePants' 
UNION 
SELECT 1, 'Bugs', 'Bunny' 
UNION 
SELECT 2, 'Homer', 'Simpson' 
UNION 
SELECT 3, 'Mickey', 'Mouse' 
UNION 
SELECT 4, 'Fred', 'Flintstone'

SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5

ROLLBACK Transaction t1

The problem is the "ABS(CHECKSUM(NEWID())) % 5" portion of this query sometime returns more than 1 row and sometimes returns 0 rows. I must be missing something but I can't see it.

If I change the query to

DECLARE @n int
set @n= ABS(CHECKSUM(NEWID())) % 5

SELECT FirstName from TestingName
WHERE NameID = @n

Then everything works and I get a random number per row.

If you take the query above and paste it into SQL management studio and run the first query a bunch of times you will see what I am attempting to describe.

The final update query will look like

Update TableWithABunchOfNames
set [FName] = (SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5) 

This does not work because sometimes I get more than 1 row and sometimes I get no rows.

What am I missing?

3
I have edited your title. Please see, "Should questions include “tags” in their titles?", where the consensus is "no, they should not".John Saunders

3 Answers

1
votes

The problem is that you are getting a different random value for each row. That is the problem. This query is probably doing a full table scan. The where clause is executed for each row -- and a different random number is generated.

So, you might get a sequence of random numbers where none of the ids match. Or a sequence where more than one matches. On average, you'll have one match, but you don't want "on average", you want a guarantee.

This is when you want rand(), which produces only one random number per query:

SELECT FirstName
from TestingName
WHERE NameID = floor(rand() * 5);

This should get you one value.

1
votes

Why not use top 1?

Select top 1 firstName
From testingName
Order by newId()
1
votes

This worked for me:

WITH
CTE
AS
(
    SELECT
        ID
        ,FName
        ,CAST(5 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) AS int) AS rr
    FROM
        dbo.TableWithABunchOfNames
)
,CTE_ForUpdate
AS
(
    SELECT
        CTE.ID
        , CTE.FName
        , dbo.TestingName.FirstName AS RandomName
    FROM
        CTE
        LEFT JOIN dbo.TestingName ON dbo.TestingName.NameID = CTE.rr
)
UPDATE CTE_ForUpdate
SET FName = RandomName
;

This solution depends on how smart optimizer is.

For example, if I use INNER JOIN instead of LEFT JOIN (which is the correct choice for this query), optimizer would move calculation of random numbers outside the join loop and end result would be not what we expect.

I created a table TestingName with 5 rows as in the question and a table TableWithABunchOfNames with 100 rows.

Here is the execution plan with LEFT JOIN. You can see the Compute scalar that calculates random numbers is done before the join loop. You can see that 100 rows were updated:

left join

Here is the execution plan with INNER JOIN. You can see the Compute scalar that calculates random numbers is done after the join loop and with extra filter. This query may update not all rows in TableWithABunchOfNames and some rows in TableWithABunchOfNames may be updated several times. You can see that Filter left 102 rows and Stream aggregate left only 69 rows. It means that only 69 rows were eventually updated and also there were multiple matches for some rows (102 - 69 = 33).

inner join


To guarantee that the result is what you expect you should generate random number for each row in TableWithABunchOfNames and explicitly remember the result, i.e. materialize the CTE shown above. Then use this temporary result to join with the table TestingName.

You can add a column to TableWithABunchOfNames to store generated random numbers or save CTE to a temp table or table variable.