The immediate problem is that you return
from the FILEID
production. return
returns, so the parse terminates with the returned value. Normally, semantic rules should provide their result by assigning to the variable $$
. (That's not necessary for "unit rules" where there is only one symbol on the right-hand side; before the action is executed, the parser does $$ = $1
, so if that's what you want, you can just leave the action out as you do in your two FILEID
rules.)
In addition, your expressions
production doesn't do anything with $2
, so even if you fixed the first problem, you would still only see one e
in the result.
Your expressions
production is also incorrect, since it requires one EOF
token for every e
, in addition to EOF from the base case. Consider how the productions work:
expressions -> e expressions EOF
-> e e expressions EOF EOF
-> e e e expressions EOF EOF EOF
-> e e e EOF EOF EOF EOF
Personally, I'd suggest using left-recursion instead of right-recursion. Bottom-up parsers like jison prefer left-recursion, and it usually leads to more natural semantic rules, as in this case.
Finally, you need to return the final value when you actually reach the end of the input. In jison, that usually requires an explicit start rule whose semantic action is return
.
So, with all that in mind, let's try this: (I changed the names of some of the non-terminals, and down-cased FILEID
because it is conventional to use lower-case for non-terminals and upper-case for terminals)
%start prog
%%
prog : exprs EOF { return $1; }
;
exprs : { $$ = []; }
| exprs expr { $$.push($2); }
;
expr : file_id
| STRING
;
file_id: FILEVERSION NUMBER { $$ = $1 + $2; }
;
One note about your regular expression for matching strings:
\"(\\.|[^"])*\" return 'STRING'
Although it apparently works in javascript (mostly; see below), it will exhibit a bug in flex (or a Posix-compatible regular expression library). It mostly works in javascript because the javascript regular expression alternation operator |
is ordered; if the first alternative matches, the second alternative is never tried unless the rest of the pattern fails to match (and in that case, the bug will be triggered).
But in (f)lex, the alternation operator notices all matching alternatives, and eventually chooses the longest possible match. The result is that when matching "\\"..."
, flex will match the token until the third quote, by using [^"]
to match the first \ and then \\.
to match the \". That lets it continue looking for a closing quote.
It's easy to write the regex so that it will work with either semantic, and I'd strongly suggest doing that in case you ever want to migrate to a different parser generator, by simply ensuring that \ is not matched by [^"]
:
\"(\\.|[^\\"])*\" return 'STRING'
This change will also fix the subtle bug, even in javascript, that "\"
is considered a valid string token if it is the last string in the input. In this case, javascript will first use \\.
to match \", but once it does that it will not find any closing quote. It will then backtrack and try matching with [^"]
, which will match the \, allowing the quote to then be recognized as the closing quote.