I have some text that is in Japanese, but some non-japanese Chinese characters got mixed up in it. I noticed it because the Japanese font that I use does not support them and browser renders them using a different font. As far as I've seen those characters are not used in Japanese, so they got there by mistake (text comes from OCR). I used this to find kanji in text, but it appears to mtach all Chinese characters and not just kanji. Is there any reliable way to detect those non-japanese characters, like checking certain sections of unicode?
The only solution that I can think of is making a complete list (or more like finding one) of kanji that are in use and checking each character if it's on the list, but I suspect it might be a little slow. Nonetheless if I won't find a better way to achieve this, I'll probably solve it this way.