Unicode scalar value in Swift

Question

In Swift Programming Language 3.0, chapter on string and character, the book states

A Unicode scalar is any Unicode code point in the range U+0000 to U+D7FF inclusive or U+E000 to U+10FFFF inclusive. Unicode scalars do not include the Unicode surrogate pair code points, which are the code points in the range U+D800 to U+DFFF inclusive

What does this mean?

An Unicode scalar value is just any Unicode codepoint (= any entry in the Unicode character list) that is not a surrogate (=a codepoint that has no meaning alone but has to be used in a pair to form a character, see for example Hangul 𠀑 character). Note that an Unicode scalar value may not be a meaningful character alone, think about the composition of character à which is made by two codepoints: a and U+0300 COMBINING GRAVE ACCENT. You may want to also read How can I perform a Unicode aware character by character comparison? — Adriano Repetti
What is a surrogate pair?, What is a “surrogate pair” in Java? — phuclv

Jano Jano · Accepted Answer · 2018-05-20T21:34:12

A Unicode Scalar is a code point which is not serialised as a pair of UTF-16 code units.

A code point is the number resulting from encoding a character in the Unicode standard. For instance, the code point of the letter A is 0x41 (or 65 in decimal).

A code unit is each group of bits used in the serialisation of a code point. For instance, UTF-16 uses one or two code units of 16 bit each.

The letter A is a Unicode Scalar because it can be expressed with only one code unit: 0x0041. However, less common characters require two UTF-16 code units. This pair of code units is called surrogate pair. Thus, Unicode Scalar may also be defined as: any code point except those represented by surrogate pairs.

The answer from courteouselk is correct by the way, this is just a more plain english version.

Unicode scalar value in Swift

2 Answers