0
votes

I am checking whether a CString variable contains only] Chinese characters. The Unicode range for Chinese characters is 4E00 - 9FFF.

I am doing as follows:

CString str;
char ch;
GetDlgItemText( IDC_EDIT1, str );

for(int i=0;i<str.GetLength();i++) {
  ch=str[i];
  if(ch>='\u4E00'&&ch<='\u9FFF') {
  //even if input chinese character here 'if' evaluates to false
    SetDlgItemText( IDC_RICHEDIT21, str );
    SendDlgItemMessage( IDC_RICHEDIT21, EM_REPLACESEL, TRUE, (LPARAM)(LPCTSTR)str);
  } else
    break;

But if I do

if(ch=='\u4E00')

and input the symbol of \u4E00 then it works fine.

So my question is, how to find weather a character lies between a particular Unicode range?

One more thing: if I use if(ch=='\u4e00') then it gives true, but if I do if(ch<='\u4e00') it returns false. I don't understand this behavior!

My code is

CString str;
wchar_t ch;
GetDlgItemText( IDC_EDIT1, str );
for(int i=0;i<str.GetLength();i++) {
  ch=str[i];
  if(ch<='\u4e01') {
    //returns false,  but returns true if(ch=='\u4e01')
    SetDlgItemText( IDC_RICHEDIT21, str );
    SendDlgItemMessage( IDC_RICHEDIT21, EM_REPLACESEL, TRUE, (LPARAM)(LPCTSTR)str);
  else
    break;
}
3
Um, you are using char, not wchar.Raymond Chen
if i use wchar i am getting error as follows, test3Dlg.cpp(155): error C2065: 'wchar' : undeclared identifierNomesh Gajare
It's WCHAR (defined by the Windows headers), or wchar_t to use the C++ type.Cody Gray
That is the range for CJK Unified Ideographs. Don't forget about the CJK Radicals Supplement, Kangxi Radicals, CJK Symbols and Punctuation, CJK Unified Ideographs Extension A, CJK Compatibility Ideographs, CJK Unified Ideographs Extension B, CJK Unified Ideographs Extension C, CJK Unified Ideographs Extension D, or CJK Compatibility Ideographs Supplement. Note that some of these are above U+FFFF. And don't forget about the upcoming CJK Unified Ideographs Extension E or F, which do not yet have codepoints. Likely you'll need to rethink what you are trying to accomplish.Dono
You are ignoring warnings from the compiler. Don't. Casting just digs you a deeper hole. A wide character literal requires a L in front, like L'\u4e00'Hans Passant

3 Answers

1
votes

Chinese character ranges:

  • U+3400 - U+4DB5
  • U+4E00 - U+62FF
  • U+6300 - U+77FF
  • U+7800 - U+8CFF
  • U+8D00 - U+9FCC
  • U+20000 - U+215FF
  • U+21600 - U+230FF
  • U+23100 - U+245FF
  • U+24600 - U+260FF
  • U+26100 - U+275FF
  • U+27600 - U+290FF
  • U+29100 - U+2A6DF
  • U+2A700 - U+2B734
  • U+2B740 - U+2B81D

You'll have to check all these ranges to be complete and thorough.

0
votes

The range of "char" type is -128~127 or 0~255 depending on your compiler. You should use "wchar_t" or "unsigned short" to make it range from 0 to 65535 or the varible cannot represent that unicode chars.

Btw, you should not place SetDlgItemText and SendDlgItemMessage in that "if" block. define variable "i" before "for" and check if the value of i equals str.Length() after loop.

0
votes

I got the answer. It can be compared as follows:

CString str;
wchar_t ch;
GetDlgItemText( IDC_EDIT1, str );

for(int i=0;i<str.GetLength();i++) {
  ch=str[i];
  if((unsigned int)ch>=0x4E00u&&(unsigned int)ch<=ox9FFFu) {
    SetDlgItemText( IDC_RICHEDIT21, str);
    SendDlgItemMessage( IDC_RICHEDIT21, EM_REPLACESEL, TRUE, (LPARAM)(LPCTSTR)str);
  } else
    break;