Collation under the Unicode Technical Standard #10 (UCA), which is a separate thing from being Unicode Compliant, in case you were wondering about that, implies not only ordering/sorting but also comparison, questions of "is string 1 equal to string 2". Sometimes code points which are not the same value in both strings are to be considered equal for collation and comparison purposes, at least that is implied by this blog post which is talking from a Perl standard library perspective.
What I want to know is, does (a) Delphi XE2 already fully implement the entire Unicode Collation Spec, and (b) if not, does a third party library do so?
Sample code:
Str1 := Chr($212B);
Str2 := Chr($C5);
n := CompareStr(Str1,Str2); // in delphi this is not zero, under UCA rules, should be 0.
According to the Unicode collation spec, Unicode collation should consider all the above codepoints equivalent under comparison. That makes no sense from a binary point of view, and so I'm glad that neither CompareStr in Delphi, nor cmp in perl (from the linked article) are polluted with Unicode glitches, but what if you want to do a unicode-compliant collation in Delphi, like the perl Unicode::Collation library? How?
Update AnsiCompareStr
would call the Win32 CompareString
and would handle some locale specific cases like the above, and from reading around the internet, the classic Windows unicode collation behaviour and UCA are converging slowly but not completely, with UCA seeming to be the one that gets changed to make it more like Windows collation.