3
votes

I am trying to write an ANTLR4 grammar to parse actionscript3. I've decided to start with something fairly coarse grained:

grammar actionscriptGrammar;

OBRACE:'{';
CBRACE:'}';
STRING_DELIM:'"';

BLOCK_COMMENT : '/*' .*? '*/' -> skip;
EOL_COMMENT : '//' .*? '/n' -> skip;
WS: [ \n\t\r]+ -> skip;

TEXT: ~[{} \n\t\r"]+;

thing
    : TEXT
    | string_literal
    | OBRACE thing+? CBRACE;

string_literal : STRING_DELIM .+? STRING_DELIM;

start_rule
    : thing+?;

Basically, I want a tree of things grouped by their lexical scope. I want comments to be ignored, and string literals be their own things so that any braces they may include do not affect lexical scope. The string_literal rule works fine (such as it is) but the two comment rules don't appear to have any effect. (i.e. comments aren't being ignored).

What am I missing?

2

2 Answers

7
votes

This is from a simplified Java grammar I wrote in ANTLR v4.

WS
    : [ \t\r\n]+ -> channel(HIDDEN)
;

COMMENT
    : '/*' .*? '*/' -> skip
;

LINE_COMMENT
    : '//' ~[\r\n]* -> skip
;

May be this could help you out.

Also, try rearranging your code. Write the Parser Rules first and Lexer Rules last. Follow a Top-Down approach. I find it much more helpful in debugging. It will also look nice when you create an HTML export of your grammar from ANTLR 4 Eclipse Plugin.

Good Luck!

4
votes

The answer is that your TEXT rule is consuming your comments. Rather than using a negated set, use something like:

TEXT: [a-zA-Z0-9_][/a-zA-Z0-9.;()\[\]_-]+ ;

That way, your comments cannot be matched by TEXT.