2
votes

I am trying to parse APL expressions using ANTLR, It is sort of APL source code parser. It parse normal characters but fails to parse special symbols(like '←')

expression = N←0

Lexer

/* Lexer Tokens. */

NUMBER:    
 (DIGIT)+ ( '.' (DIGIT)+ )?;

ASSIGN:
    '←'
    ;

DIGIT : 
    [0-9]
    ;

Output:

[@0,0:1='99',<NUMBER>,1:0]
**[@1,4:6='â??',<'â??'>,2:0**]
[@2,7:6='<EOF>',<EOF>,2:3]

Can some one help me to parse special characters from APL language.

I am following below steps.

  1. Written Grammar
  2. "antlr4.bat" used to generate parser from grammar.
  3. "grun.bat" is used to generate token
1
You only showed that the arrow is not properly displayed to your console. Can you edit your question and add a code snippet that shows the parsing of your input with the resulting error message(s)? - Bart Kiers
Not sure why but I am not able to edit my Own Question I am following below steps. 1. Written Grammar 2. "antlr4.bat" used to generate parser from grammar. 3. "grun.bat" is used to generate token listed in question. I think I am missing to pass character encoding - Sumit Tyagi

1 Answers

2
votes
  1. "grun.bat" is used to generate token

That just means your terminal cannot display the character properly. There is nothing wrong with the generated parser or lexer not being able to recognise .

Just don't use the bat file, but rather test your lexer and parser by writing a small class yourself using your favourite IDE (which can display the characters properly).

Something like this:

grammar T;

expression
 : ID ARROW NUMBER
 ;

ID     : [a-zA-Z]+;
ARROW  : '←';
NUMBER : [0-9]+;
SPACE  : [ \t\r\n]+ -> skip;

and a main class:

import org.antlr.v4.runtime.*;

public class Main {
  public static void main(String[] args) {
    TLexer lexer = new TLexer(CharStreams.fromString("N ← 0"));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    System.out.println(parser.expression().toStringTree(parser));
  }
}

which will display:

(expression N ← 0)

EDIT

You could also try using the unicode escape for the arrow like this:

grammar T;

expression
 : ID ARROW NUMBER
 ;

ID     : [a-zA-Z]+;
ARROW  : '\u2190';
NUMBER : [0-9]+;
SPACE  : [ \t\r\n]+ -> skip;

and the Java class:

import org.antlr.v4.runtime.*;

public class Main {
  public static void main(String[] args) {
    String source = "N \u2190 0";
    TLexer lexer = new TLexer(CharStreams.fromString(source));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    System.out.println(source + ": " + parser.expression().toStringTree(parser));
  }
}

which will print:

N ← 0: (expression N ← 0)