9
votes

Is there a function in SQL Server to normalize a unicode string? e.g.

UPDATE Orders SET Notes = NormalizeString(Notes, 'FormC')

Unicode Normalization Forms:

  • C​omposition (C): A + ¨ becomes Ä
  • D​ecomposition (D): Ä becomes A + ¨
  • Compatible Composition (KC): A + ¨ + + n becomes Ä + f + i + n
  • Compatible Decomposition (KD): Ä + + n becomes A + ¨ + f + i + n

i cannot find any built-in function, so i assume there is none.


Ideally, if there can be only one, then i happen to need Form C today:

Unicode normalization form C, canonical composition. Transforms each decomposed grouping, consisting of a base character plus combining characters, to the canonical precomposed equivalent. For example, A + ¨ becomes Ä.

See also

4

4 Answers

7
votes

Sorry, no, there is no such function in any version of SQL Server to date (2012 test builds). Comparisons can be correctly composition-insensitive, but there isn't a function to convert character composition usage to one normal form.

It has been suggested for a future version of the ANSI standard under the syntax NORMALIZE(string, NFC) but it's going to be a long time before this makes it to the real world. For now if you want to do normalisation you'll have to do it in a proper programming language with better string-handling capabilities, either by pulling the string out of the database or by writing a CLR stored procedure to do it.

1
votes

try this CLR function

using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlString fnRemoveDiacritics(string text)
    {
        string stringFormD = text.Normalize(System.Text.NormalizationForm.FormD);
        System.Text.StringBuilder retVal = new System.Text.StringBuilder();
        for (int index = 0; index < stringFormD.Length; index++)
        {
            if (System.Globalization.CharUnicodeInfo.GetUnicodeCategory(stringFormD[index]) != System.Globalization.UnicodeCategory.NonSpacingMark)
                retVal.Append(stringFormD[index]);
        }
        return retVal.ToString().Normalize(System.Text.NormalizationForm.FormC);
    }
}

and in SQL

SELECT dbo.fnRemoveDiacritics('Äěščřžýáíé')
-- Returns: Aescrzyaie

thanks to http://www.dotnetportal.cz/blogy/4/Tomas-Jecha/663/NET-Tip-6-Ciste-odstraneni-diakritiky

1
votes

I was some problem,

I write a new CRL function in C#, and use like a SQL function.

My C# code (nomalize to NFC or NFD). Also, handle NULL string correctly.:

using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Text;

    public static class CLRUnicodeNormalize
    {
        [SqlFunction]
        public static SqlString UnicodeNormalizeC(SqlString inStr)
        {
            if (inStr.IsNull) return SqlString.Null;
            return inStr.ToString().Normalize(NormalizationForm.FormC);
        }
        public static SqlString UnicodeNormalizeD(SqlString inStr)
        {
            if (inStr.IsNull) return SqlString.Null;
            return inStr.ToString().Normalize(NormalizationForm.FormD);
        }
    }

Use example in SQL query (form any normalization to C):

UPDATE o SET o.ObjectName=dbo.UnicodeNormalizeC(o.ObjectName)

Installing the normalization function before use (you don't need C# in this case... The binary conatains the use-ready .Net code, see source above):

/* 
GO
DROP FUNCTION [dbo].UnicodeNormalize
GO
DROP ASSEMBLY [CLRUnicodeNormalize]
GO
*/


CREATE ASSEMBLY [CLRUnicodeNormalize]
FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C010300747F4BC90000000000000000E00022200B013000000A0000000600000000000076280000002000000040000000000010002000000002000004000000000000000600000000000000008000000002000000000000030060850000100000100000000010000010000000000000100000000000000000000000242800004F00000000400000C803000000000000000000000000000000000000006000000C00000070270000380000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E746578740000007C08000000200000000A000000020000000000000000000000000000200000602E72737263000000C80300000040000000040000000C0000000000000000000000000000400000402E72656C6F6300000C000000006000000002000000100000000000000000000000000000400000420000000000000000000000000000000058280000000000004800000002000500A4200000CC0600000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000A20F00281000000A2C067E1100000A2A0F00FE16120000016F1200000A176F1300000A281400000A2AA20F00281000000A2C067E1100000A2A0F00FE16120000016F1200000A186F1300000A281400000A2A000042534A4201000100000000000C00000076342E302E33303331390000000005006C00000014020000237E0000800200000803000023537472696E6773000000008805000004000000235553008C0500001000000023475549440000009C0500003001000023426C6F620000000000000002000001471500000900000000FA0133001600000100000014000000020000000200000002000000140000000F00000001000000020000000000FE0101000000000006003201A50206009F01A5020600510073020F00C5020000060079003A02060015013A020600E1003A02060086013A02060052013A0206006B013A02060090003A020600650086020600430086020600C4003A020600AB00D1010600E90221020A0000014C020A00EB01D4020600F701210206002802FC02000000000100000000000100010081011000BD01000041000100010050200000000096000A003B00010079200000000096001C003B000200000001006D02000001006D02090067020100110067020600190067020A00290067021000310067021000390067021000410067021000490067021000510067021000590067021000610067021500690067021000710067021000790067021000890067020600910016021A0091001C021E008100F50122009900C70126009100F0022C0020007B002A012E000B0042002E0013004B002E001B006A002E00230073002E002B008B002E0033008B002E003B008B002E00430073002E004B0091002E0053008B002E005B008B002E006300A9002E006B00D3002E007300E000048000000100000000000000000000000000BD01000004000000000000000000000032003A000000000004000000000000000000000032002E00000000000000003C4D6F64756C653E00556E69636F64654E6F726D616C697A654300556E69636F64654E6F726D616C697A65440053797374656D2E44617461006D73636F726C696200477569644174747269627574650044656275676761626C6541747472696275746500436F6D56697369626C6541747472696275746500417373656D626C795469746C6541747472696275746500417373656D626C7954726164656D61726B417474726962757465005461726765744672616D65776F726B41747472696275746500417373656D626C7946696C6556657273696F6E41747472696275746500417373656D626C79436F6E66696775726174696F6E4174747269627574650053716C46756E6374696F6E41747472696275746500417373656D626C794465736372697074696F6E41747472696275746500436F6D70696C6174696F6E52656C61786174696F6E7341747472696275746500417373656D626C7950726F6475637441747472696275746500417373656D626C79436F7079726967687441747472696275746500417373656D626C79436F6D70616E794174747269627574650052756E74696D65436F6D7061746962696C69747941747472696275746500434C52556E69636F64654E6F726D616C697A650053797374656D2E52756E74696D652E56657273696F6E696E670053716C537472696E6700546F537472696E6700434C52556E69636F64654E6F726D616C697A652E646C6C006765745F49734E756C6C0053797374656D004E6F726D616C697A6174696F6E466F726D0053797374656D2E5265666C656374696F6E004D6963726F736F66742E53716C5365727665722E536572766572002E63746F7200696E5374720053797374656D2E446961676E6F73746963730053797374656D2E52756E74696D652E496E7465726F7053657276696365730053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300446562756767696E674D6F6465730053797374656D2E446174612E53716C5479706573004F626A656374006F705F496D706C696369740053797374656D2E54657874000000000042B7983A5EC3D848A69F95A94C809E3900042001010803200001052001011111042001010E042001010203200002030611490320000E0520010E115105000111490E08B77A5C561934E089060001114911490801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F7773010801000200000000001701001253514C4E6F726D616C697A65537472696E67000005010000000017010012436F7079726967687420C2A920203230313900002901002434303635643437642D343839642D343638332D626233652D32393963316632623536626400000C010007312E302E302E3000004901001A2E4E45544672616D65776F726B2C56657273696F6E3D76342E360100540E144672616D65776F726B446973706C61794E616D65122E4E4554204672616D65776F726B20342E3604010000000000000000A704E49600000000020000007C000000A8270000A809000000000000000000000000000010000000000000000000000000000000525344534526D8C335B5074C819DB6DF9CC26A3701000000433A5C55736572735C537A395C736F757263655C7265706F735C53514C4E6F726D616C697A65537472696E675C53514C4E6F726D616C697A65537472696E675C6F626A5C52656C656173655C434C52556E69636F64654E6F726D616C697A652E706462004C28000000000000000000006628000000200000000000000000000000000000000000000000000058280000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF250020001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000018000080000000000000000000000000000001000100000030000080000000000000000000000000000001000000000048000000584000006C03000000000000000000006C0334000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000001000000000000000100000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B004CC020000010053007400720069006E006700460069006C00650049006E0066006F000000A802000001003000300030003000300034006200300000001A000100010043006F006D006D0065006E007400730000000000000022000100010043006F006D00700061006E0079004E0061006D00650000000000000000004E0013000100460069006C0065004400650073006300720069007000740069006F006E0000000000530051004C004E006F0072006D0061006C0069007A00650053007400720069006E00670000000000300008000100460069006C006500560065007200730069006F006E000000000031002E0030002E0030002E003000000050001800010049006E007400650072006E0061006C004E0061006D006500000043004C00520055006E00690063006F00640065004E006F0072006D0061006C0069007A0065002E0064006C006C0000004800120001004C006500670061006C0043006F007000790072006900670068007400000043006F0070007900720069006700680074002000A90020002000320030003100390000002A00010001004C006500670061006C00540072006100640065006D00610072006B00730000000000000000005800180001004F0072006900670069006E0061006C00460069006C0065006E0061006D006500000043004C00520055006E00690063006F00640065004E006F0072006D0061006C0069007A0065002E0064006C006C000000460013000100500072006F0064007500630074004E0061006D00650000000000530051004C004E006F0072006D0061006C0069007A00650053007400720069006E00670000000000340008000100500072006F006400750063007400560065007200730069006F006E00000031002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000031002E0030002E0030002E0030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000783800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
WITH PERMISSION_SET = UNSAFE
GO

CREATE FUNCTION [dbo].UnicodeNormalizeC(@inStr nvarchar(max)) RETURNS nvarchar(max) 
AS EXTERNAL NAME [CLRUnicodeNormalize].[CLRUnicodeNormalize].[UnicodeNormalizeC]
GO
CREATE FUNCTION [dbo].UnicodeNormalizeD(@inStr nvarchar(max)) RETURNS nvarchar(max) 
AS EXTERNAL NAME [CLRUnicodeNormalize].[CLRUnicodeNormalize].[UnicodeNormalizeD]
GO

Compiled with .Net version: 4.6

I don't take any resposinbility the use of the binary code above!

Test example:

DECLARE @str_nfd nvarchar(10) = dbo.UnicodeNormalizeD('á');
SELECT LEN(@str_nfd);   -- NFD normalization, return: 2
SELECT LEN(dbo.UnicodeNormalizeC(@str_nfd)); -- back to NFC, return: 1
1
votes

In the absence of an in-built solution in SQL Server and if you don't want to write a C# function, you could use the translate function to do that manually as such:

select translate(last_name, 'éêëèàäçïîìôöòûù', 'eeeeaaciiiooouu') from employees

You can also use the following solution from: https://coderwall.com/p/a6koxq/how-to-remove-diacritics-in-sql-server

The easiest way to remove diacritics from a string in SQL Server is to collate the string using a character set that does not include diacritics, such as :

SELECT 'àéêöhello!' Collate SQL_Latin1_General_CP1253_CI_AI

This will output:

aeeohello!

This work only if you are not using a unicode string, so cast it as varchar first if you have a unicode string.