1
votes

I need to check if a string contains chinese characters. After searching i found that i have to look with the regex on this pattern \u31C0-\u31EF, But i don't manage to get the regex work.

Anyone experienced with this situation ? is the regex correct ?

1
Using "[\u31C0-\u31EF]" will indeed match any character whose code point is in the range 0x31C0 to 0x31EF. You need the square brackets. I have no idea whether the actual numbers are correct; there are only 48 characters in this range, and I thought CJK had a lot more than that, but what do I know?ajb
There's definitely more characters in CJK, see here.juan.facorro
The duplicate is not marked with a java tag. Is this really a duplicate?Suragch

1 Answers

2
votes

As discussed here, in Java 7 (i.e. regex compiler meets requirement RL1.2 Properties from UTS#18 Unicode Regular Expressions), you can use the following regex to match a Chinese (well, CJK) character:

\p{script=Han}

which can be appreviated to simply

\p{Han}