0
votes

From what I've gathered:

Hiragana is U+3040 to U+309F

Katakana is U+30A0 to U+30FF.

U+4E00..U+9FFF is part of the complete [Chinese] set, but not all.

The exact ranges for Chinese characters (except the extensions) are [\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\uF900-\uFAAD].

CJK (for Chinese, Japanese, Korean) encompasses all characters for the Chinese Hànzì, the Japanese Kanji and the Korean Hanja. (So they are all mixed).

The linked answers don't fully explain where everything is. Wondering if there is a clear answer to this so I don't have to go through each character one-by-one.

2
This is not the kind of problem you ever want to have to solve, it is quite unclear why you'd need to. Both Japanese and Korean have adopted Chinese glyphs in their script.Hans Passant

2 Answers

2
votes

so I don't have to go through each character one-by-one.

You're supposed to examine the properties. This is for Unicode 12.1.

Script_Extensions: Han (89513 characters)

U+02E80…U+02E99
U+02E9B…U+02EF3
U+02F00…U+02FD5
U+03001…U+03003
U+03005…U+03011
U+03013…U+0301F
U+03021…U+0302D
U+03030
U+03037…U+0303F
U+030FB
U+03190…U+0319F
U+031C0…U+031E3
U+03220…U+03247
U+03280…U+032B0
U+032C0…U+032CB
U+032FF
U+03358…U+03370
U+0337B…U+0337F
U+033E0…U+033FE
U+03400…U+04DB5
U+04E00…U+09FEF
U+0F900…U+0FA6D
U+0FA70…U+0FAD9
U+0FE45…U+0FE46
U+0FF61…U+0FF65
U+1D360…U+1D371
U+1F250…U+1F251
U+20000…U+2A6D6
U+2A700…U+2B734
U+2B740…U+2B81D
U+2B820…U+2CEA1
U+2CEB0…U+2EBE0
U+2F800…U+2FA1D

Script_Extensions: Hangul (11775 characters)

U+01100…U+011FF
U+03001…U+03003
U+03008…U+03011
U+03013…U+0301F
U+0302E…U+03030
U+03037
U+030FB
U+03131…U+0318E
U+03200…U+0321E
U+03260…U+0327E
U+0A960…U+0A97C
U+0AC00…U+0D7A3
U+0D7B0…U+0D7C6
U+0D7CB…U+0D7FB
U+0FE45…U+0FE46
U+0FF61…U+0FF65
U+0FFA0…U+0FFBE
U+0FFC2…U+0FFC7
U+0FFCA…U+0FFCF
U+0FFD2…U+0FFD7
U+0FFDA…U+0FFDC

Script_Extensions: Hiragana (431 characters)

U+03001…U+03003
U+03008…U+03011
U+03013…U+0301F
U+03030…U+03035
U+03037
U+0303C…U+0303D
U+03041…U+03096
U+03099…U+030A0
U+030FB…U+030FC
U+0FE45…U+0FE46
U+0FF61…U+0FF65
U+0FF70
U+0FF9E…U+0FF9F
U+1B001…U+1B11E
U+1B150…U+1B152
U+1F200

Script_Extensions: Katakana (356 characters)

U+03001…U+03003
U+03008…U+03011
U+03013…U+0301F
U+03030…U+03035
U+03037
U+0303C…U+0303D
U+03099…U+0309C
U+030A0…U+030FF
U+031F0…U+031FF
U+032D0…U+032FE
U+03300…U+03357
U+0FE45…U+0FE46
U+0FF61…U+0FF9F
U+1B000
U+1B164…U+1B167
1
votes

This is a sorted list containing anything that's used in Chinese, Japanese, Korean (and also some Vietnamese)