In C, ‘aaaaaaaaaaa’
is neither a lexical nor a syntactic error, although its semantics are implementation-defined:
The value of an integer character constant containing more than one character (e.g.,
'ab'), or containing a character or escape sequence that does not map to a single-byte
execution character, is implementation-defined. (C standard, section 6.4.4.4, paragraph 10.)
It would have been easy to restrict character constants to a single character or escape sequence, but not by counting the display length of the character constant. (For example, 'ab'
(length 4) would be illegal while '\x2C'
(length 5) is legal, and '\u00C3'
(length 6) depends on encoding.)
In any case, the frontier between "lexical" and "syntactic" errors is not particularly well-defined, and particularly not for C, in which 23skidoo
is a valid preprocessor token but not a valid token.
If your question is "should I detect and react to this error in the scanner or the parser", I would answer, "whichever seems most convenient to you". My preference is to centralize all error handling in a single place, though, which means in the parser, and that requires the scanner to pass a special "bad token" token to the parser in order to trigger error detection (and possibly recovery) in the parser.
'some string'
is (almost) entirely equivalent to"some string"
. Not so much in C or C++, where the single quotes are used for character literals, including multi-byte character literals, and the double quotes denote a string literal. – twalberg