Detect if character is simplified or traditional Chinese character

Question

I found this question which gives me the ability to check if a string contains a Chinese character. I'm not sure if the unicode ranges are correct but they seem to return false for Japanese and Korean and true for Chinese.

What it doesn't do is tell if the character is traditional or simplified Chinese. How would you go about finding this out?

update

Q: How can I recognize from the 32 bit value of a Unicode character if this is a Chinese, Korean or Japanese character?

http://unicode.org/faq/han_cjk.html

Their argument that the characters regardless of their shape have the same meaning and therefore should be represented by the same code. Well, it's not meaningless to me because I am analyzing individual characters which doesn't work with their solution:

A better solution is to look at the text as a whole: if there's a fair amount of kana, it's probably Japanese, and if there's a fair amount of hangul, it's probably Korean.

would the codepage help distinguish? Seems like simplified Chinese is CP 936 and Traditional is CP 950, at least in the Microsoft world. Perhaps start at i18nguy.com/unicode/codepages.html for the MS and IBM codepages. — rajah9
I did a quick google search and found this unicode.org/faq/han_cjk.html I found some of the questions interesting and they discuss Traditional characters in there too. Hope it helps! — Shaded
Shaded's linked FAQ seems to answer your question exactly. As the example in the link notes, how would you determine if "chat" is English or French? If you don't think that your answer is in there, you might want to expand your question a bit. — Thanatos
It's a good link, one that I got to prior. Ah quite complicated. The orthography of chat/chat en/fn surely makes it indistinguishable; however, if we used the IPA to write chat/chat [ʃæ/tʃæt] it would be possible through syllable construction because it would be based on sound and not an archaic orthography. — thenengah
But Chinese is much less complicated because 說/说 [ t/s shuo1 'to speak'] are completely different characters one being the traditional equivalent to speak and one being the simplified equivalent to speak. They have different unicode values opposed to a/a en/fn which share the same character code. — thenengah

Tom Anderson Tom Anderson · Accepted Answer · 2011-01-06T21:25:14

As I think you've discovered, you can't. Simplified and traditional are just two styles of writing the same characters - it's like the difference between Roman and Gothic script for European languages.

Detect if character is simplified or traditional Chinese character

update

3 Answers