I have a program that reads a bunch of text and analyzes it. The text may be in any language, but I need to test for japanese and chinese specifically to analyze them a different way.
I have read that I can test each character on it's unicode number to find out if it is in the range of CJK characters. This is helpful, however I would like to separate them if possible to process the text against different dictionaries. Is there a way to test if a character is Japanese OR Chinese?
\p{Han}\p{Hiragana}\p{Katakana}
but the following characters are not matching: 发同讲说宅电的手机告的世全所回广讲说跟 – yarian