Recognize Identifiers in Chinese characters by using Lex/Yacc

Question

How can I use Lex/Yacc to recognize identifiers in Chinese characters?

ibz ibz · Accepted Answer · 2010-07-08T16:01:11

I think you mean Lex (the lexer generator). Yacc is the parser generator.

According to What's the complete range for Chinese characters in Unicode?, most CJH characters fall in the 3400-9FFF range.

According to http://dinosaur.compilertools.net/lex/index.html

Arbitrary character. To match almost any character, the operator character . is the class of all characters except newline. Escaping into octal is possible although non-portable:
                             [\40-\176]
matches all printable characters in the ASCII character set, from octal 40 (blank) to octal 176 (tilde).

So I would assume what you need is something like [\32000-\117777].

Recognize Identifiers in Chinese characters by using Lex/Yacc

2 Answers