As far as I understand, most languages are context-free with some exceptions. For instance, a * b
may stand for type * pointer_declaration
or multiplication in C++. Which one takes place depends on the context, the meaning of the first identifier. Another example is name
production in VHDL
enum_literal ::= char_literal | identifer
physical_literal ::= [num] unit_identifier
func_call ::= func_identifier [parenthized_args]
array_indexing ::= arr_name (index_expr)
name ::= func_call | physical_literal | enum_litral | array_indexing
You see that syntactic forms are different but they can match if optional parameters are omitted, like f
, does it stand for func_call, physical_literal, like 1 meter with optional amount 1 is implied, or enum_literal.
Talking to Scala plugin designers, I was educated to know that you build AST to re-evaluate it when dependencies change. There is no need to re-parse the file if you have its AST. AST also worth to display the file contents. But, AST is invalidated if grammar is context-sensitive (suppose that f
was a function, defined in another file, but later user requalified it into enum literal or undefined). AST changes in this case. AST changes on whenever you change the dependencies. Another option, that I am asking to evaluate and let me know how to make it, is to build an ambiguous AST.
As far as I know, parser combinators are of PEG kind. They hide the ambiguity by returning you the first matched production and f
would match a function call because it is the first alternative in my grammar. I am asking for a combinator that instead of falling back on the first success, it proceeds to the next alternative. In the end, it would return me a list of all matching alternatives. It would return me an ambiguity.
I do not know how would you display the ambiguous file contents tree to the user but it would eliminate the need to re-parse the dependent files. I would also be happy to know how modern language design solve this problem.
Once ambiguous node is parsed and ambiguity of results is returned, I would like the parser to converge because I would like to proceed parsing beyond the name
and I do not want to parse to the end of file after every ambiguity. The situation is complicated by situations like f(10)
, which can be a function call with a single argument or a nullary function call, which return an array, which is indexed afterwards. So, f(10) can match name two ways, either as func_call
directly or recursively, as arr_indexing -> name ~ (expr)
. So, it won't be ambiguity like several parallel rules, like fcall | literal
. Some branches may be longer than 1 parser before re-converging, like fcall ~ (expr) | fcall
.
How would you go about solving it? Is it possible to add ambiguating combinator to PEG?