4
votes

I am trying to integrate an ANTLR-defined grammar into NetBeans, and so far valid syntax is working fine. However, currently if you enter any character that's not defined in the language somewhere (for example, the '?' character) the custom editor immediately crashes because it fails to find a rule for that character.

Is there a way in ANTLR to catch and skip EVERY character that doesn't match a rule (and perhaps output an error message) without having the whole lexer crash and burn? I would like to just flag invalid characters, skip over them, and continue lexing, something like:

//some rules + tokens

invalidCharacter
    :    <<catch all other characters>>
        {System.out.println("undefined character entered!")}
    ;

Any help would be apprciated.

1

1 Answers

7
votes

If you're only interested in illegal chars inside the lexer, something as simple as this might do the trick for you:

grammar T;

@lexer::members {
  public List<String> errors = new ArrayList<String>();
}

parse
  :  .* EOF
  ;

INT
  :  '0'..'9'+
  ;

WORD
  :  ('a'..'z' | 'A'..'Z')+
  ;

SPACE
  :  ' ' {$channel=HIDDEN;}
  ;

INVALID
  :  . {
         errors.add("Invalid character: '" + $text + "' on line: " +
             getLine() + ", index: " + getCharPositionInLine());
       }
  ;

As you can see, only integers and ascii words are accepted, all other chars will cause an error to be added to the List inside the lexer. When parsing a string like "abc 123 ? foo !" with the test class:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    TLexer lexer = new TLexer(new ANTLRStringStream("abc 123 ? foo !"));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    tokens.toString(); // dummy call to toString() which causes all tokens to be created
    if(!lexer.errors.isEmpty()) {
      for(String error : lexer.errors) {
        System.out.println(error);
      }
    }
    else {
      TParser parser = new TParser(tokens);
      parser.parse();
    }
  }
}

will cause the following output:

java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

Invalid character: '?' on line: 1, index: 9
Invalid character: '!' on line: 1, index: 15