0
votes

This is my XML parser grammar:

attribute   :   Name '=' STRING ;

and the lexer:

STRING      :   '"' ~[<"]* '"'
            |   '\'' ~[<']* '\''
            ;

This works, however when I retrieve the STRING bit in my C# code with :

context.STRING().ToString();

I get the text wrapped in quotation marks like : "hello", instead of hello. So I try to change the parser grammar to :

attribute   :   Name '=' '"' STRING ;

or

attribute   :   Name '="' STRING ;

and I get the error : "cannot create implicit token for string literal in non-combined grammar"

I'm confused as to why the "=" is allowed in the parser grammar, but not quotation marks, and how to change the parser to retrieve the text without quotation marks. Also, it seems that the lexer already takes care of getting rid of quotation marks so I don't understand why I still get them when parsing.

1
I assume you have a rule like EQ: '='; in your lexer grammar, right?sepp2k
@sepp2k yes, I do , EQUALS : '=' ; however, the character literal is used here in the parser, right?greenfeet

1 Answers

1
votes

If you have separate lexer and parser grammars, you are allowed to use string literals in the parser if and only if you defined a lexer rule using that string literal in the lexer. Otherwise the lexer would never produce a token that matches that literal since the lexer has no idea which string literals do or don't appear in the parser (this is not the case for combined grammars, which is why the error message says "non-combined grammar").

So you're allowed to use '=', but not '"' because you have the rule EQUALS: '=';, but no rule DQUOTE: '"';. But before you go ahead and add such a rule, let's think about what that would do and whether you want this (you don't):

If you added such a rule (or used a combined grammar where you could just use '"' without it), the attribute rule would now match a name token, followed by a = token, followed by a " token, followed by a string token. Since a string token already contains quotes at its beginning and end, that would look something like this:

SomeName   =    "   "hello"
 Name     '='  '"'  STRING

So that's not what you want. Plus it wouldn't even work, even if that were what you wanted: The first quote in the above input wouldn't be recognized as a '"' token - instead " " would be recognized as a string token, then hello as a Name and finally " as a '"' token (because there's no further quote that would make it match the STRING rule.

So this is the wrong direction to go and you shouldn't do that.


If what you want is to get the contents of the string without the quotes, the solution to that isn't to add more quotes to the grammar. You should just use Substring in your C# code to remove the first and last character from the string.