3
votes

I've got a rule to match a string that looks like so:

STRING
    : '"' ( ~( '"' | '\\' ) | '\\' . )* '"'
    ;

I dont want the quotes to be part of the tokens text. In Antlr2 I would just put '!' after the quotes to tell Antlr not to add them to the text.

Notice the '!' below.

 STRING
    : '"'! ( ~( '"' | '\\' ) | '\\' . )* '"'!
    ;

However in Antlr3 I can no longer do this as I get the error:

warning(149): Crv__.g:0:0: rewrite syntax or operator with no output option; setting output=AST

I don't know if I can use a rewrite rule here as I don't know how to write the match everything token '.'

My only other thought is to grab the matched text and return it without the quotes, but I'm not sure how to do that as the token hasn't been created yet.

I'm using the C Antlr runtime. How can I accomplish this?

2

2 Answers

1
votes

For posterity I'll mention how I ended up solving this.

I used an @after block to strip the quotes

STRING
@after
{
    SETTEXT(GETTEXT()->substring(GETTEXT(),1,GETTEXT()->len-1))
}
: '"' ( ~( '"' | '\\' ) | '\\' . )* '"'
;
0
votes

This is the solution I ended up using :

STRING          :       '"'         { \$s = ""; }
                (   '"' '"'         { \$s .= '"';}
                |   c=CHAR          { \$s .= \$c->gettext();}
                |   ' '             { \$s .= ' ';}
                )*
                '"'                 { \$this->setText(\$s); }
    ;



fragment CHAR       :   (ACCENT|SPECIAL|ALPHA|DIGIT);
fragment ACCENT     :   '\u00C0'..'\u00D6' | '\u00D9'..'\u00DD' | '\u00E0'..'\u00F6' |'\u00F9'..'\u00FD';
fragment SPECIAL    :   '.' | '!' | '-'| '?';
fragment ALPHA      :   'a'..'z' | 'A'..'Z';
fragment DIGIT      :   '0'..'9' ;

There is one minor difference that is I have a white list of character for security reasons.

But the major difference is that I build the result string incrementally, tossing the " char.

I'm in PHP language, that's why there are \$ Do you know which one is faster ?