In SQL Server, nvarchar values represent strings of Unicode code-points - I understand, by default, using UTF-16, values beyond 0xFFFF are represented as surrogate-pairs.
I wanted to set a default string value for an nvarchar UDF parameter which would contain special characters. T-SQL does not let you use hex-escape sequences in string literals, you must use the CHAR() OR NCHAR() functions to specify characters by their codepoint values, however you must use a literal for parameter default values: you cannot use NCHAR(). However I remembered that SQL Server also performs implicit conversion from varbinary to nvarchar, so:
CREATE FUNCTION DoSomething(
@foo nvarchar(50) = '\x0008', -- not supported by T-SQL syntax
@bar nvarchar(50) = NCHAR(8), -- forbidden: defaults must be a literal
@baz nvarchar(50) = 0x008 -- success!
)
I wanted to change the parameters to represent a comparison range, and I wanted the default values to represent the widest-possible range of values, thus letting me use Static SQL for a search function without needing OPTION(RECOMPILE) or the now-discredited (@foo IS NULL OR Table.Foo = @foo) pattern.
So I changed my function to this:
CREATE FUNCTION DoSomething(
@fooMin nvarchar(50) = 0x0000,
@fooMax nvarchar(50) = 0xFFFF
)
/* SELECT goes here */
WHERE
Foo BETWEEN @fooMin AND @fooMax
I reasoned 0xFFFF would be high enough to accommodate any (practical) unicode text thrown at the system I was building.
However, to my surprise, the BETWEEN operator always returned false. I wondered if something might be up with the upper-bound operand so I changed it to 0x7FFF and it worked fine.
I tried 0x8FFF next and that also worked.
But 0x9FFF and then 0x9000 failed.
To my knowledge, there is nothing special about the 0x8FFF - 0x9000 boundary in Unicode. Wikipedia reports the Basic Multilingual Plane occupies 0x0000 - 0xFFFF and that 0x900 is simply another block in the CJK area: https://en.wikipedia.org/wiki/Plane_(Unicode)#/media/File:Roadmap_to_Unicode_BMP.svg and the UTF-16 surrogates start at 0xD800 and 0xDC00 - far away from 0x900.
Here's my test-case:
SELECT N'HELLO', 0xFF, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0xFF THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x0FFF, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x0FFF THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x1000, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x1000 THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x6000, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x6000 THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x6FFF, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x6FFF THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x7000, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x7000 THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x7FFF, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x7FFF THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x8000, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x8000 THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x8FFF, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x8FFF THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x9000, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x9000 THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0x9FFF, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0x9FFF THEN 'yup' ELSE 'no' END ) UNION ALL SELECT N'HELLO', 0xFFFF, ( CASE WHEN N'HELLO' BETWEEN 0x0000 AND 0xFFFF THEN 'yup' ELSE 'no' END )
And my results:
HELLO 0xFF yup HELLO 0x0FFF no HELLO 0x1000 no HELLO 0x6000 no HELLO 0x6FFF yup HELLO 0x7000 yup HELLO 0x7FFF yup HELLO 0x8000 no HELLO 0x8FFF yup HELLO 0x9000 no HELLO 0x9FFF no HELLO 0xFFFF no
So it seems that it isn't just the 0x7FFF - 0x8000 boundary, but other boundaries too.
I wondered if it might be because it's interpreting the binary literals as little-endian instead of big-endian, but then all of the literals ending in **FF would return true because they're greater than N'H'.