I am having some difficulties understanding the specific difference between Lexical Grammar
and Syntactic Grammar
in the ECMAScript 2017 specification.
Excerpts from ECMAScript 2017
5.1.2 The Lexical and RegExp Grammars
A lexical grammar for ECMAScript is given in clause 11. This grammar has as its terminal symbols Unicode code points that conform to the rules for SourceCharacter defined in 10.1. It defines a set of productions, starting from the goal symbol InputElementDiv, InputElementTemplateTail, or InputElementRegExp, or InputElementRegExpOrTemplateTail, that describe how sequences of such code points are translated into a sequence of input elements.
Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of the ECMAScript language.
5.1.4 The Syntactic Grammar
When a stream of code points is to be parsed as an ECMAScript Script or Module, it is first converted to a stream of input elements by repeated application of the lexical grammar; this stream of input elements is then parsed by a single application of the syntactic grammar.
Questions
- Lexical grammar
- Here it says the terminal symbols are Unicode code points (individual characters)
- It also says it produces input elements (aka. tokens)
- How are these reconcilable? Either the terminal symbols are tokens, and thus it produces tokens. Or, the terminal symbols are individual code points, and that's what it produces.
- Syntactic grammar
- I have the same questions on this grammar as on the lexical grammar
- It seems to say that the terminal symbols here are tokens
- So by applying the syntactic grammar rules, valid tokens are produced, which in turn can be sent to parser? Or, does this grammar accept tokens as input and then test the overall stream of tokens for validity?
My Best Guess
- Lexing phase
- Input: Code points (source code)
- Output: Applies lexical grammar productions to produce valid tokens (lexeme type + value) as output
- Parsing phase
- Input: Tokens
- Output: Applies syntactic grammar productions (CFG) to decide if all the tokens together represent a valid stream (i.e. that the source code as a whole is a valid
Script
/Module
)