4
votes

I am trying to develop a small DSL with ANTLR for one of my projects. Therefore I wrote definitions for the lexer...

lexer grammar SpamkillerLexer;

MAILBOX: 'Mailbox';
PASSWORD: 'Password';
HOST: 'Host';
USER: 'User';
FOLDER: 'Folder';
PORT: 'Port';
ACTIONS: 'Actions';

WHEN: 'When';
SUBJECT: 'Subject';
BODY: 'Body';
EQUALS: 'Equals';
CONTAINS: 'Contains';
THEN: 'Then';
DELETE: 'Delete';
REDIRECT: 'Redirect';
TO: 'to';

BR_OP: '{';
BR_CL: '}';
EQ: '=';

STRING: '"' ( '\\"' | . )*? '"';
LITERAL: [a-zA-Z_0-9]+;

WS : [ \n\t\r]+ -> skip ;

...and the parser...

parser grammar SpamkillerParser;
mailboxes: mailbox+;
mailbox: MAILBOX LITERAL BR_OP settings BR_CL;

settings: setting+;
setting: (key EQ STRING | ACTIONS EQ actions);
key: MAILBOX | PASSWORD | HOST | USER | FOLDER | PORT;

actions: BR_OP action* BR_CL;
action: WHEN condition THEN job;
condition: (SUBJECT | BODY) (EQUALS | CONTAINS) STRING;
job: (DELETE | (REDIRECT TO STRING));

My test file looks like this:

Mailbox Foobar {
    Port = "123"
    Host = "foohost"
    User = "foouser"
    Password = "foopass"
    Folder = "Inbox"
    Actions = {
        When Subject Equals "fooooo" Then Delete
        When Body Contains "fooooo" Then Redirect to "[email protected]"
    }
}

When I test mailboxes in the ANTLR IntelliJ Plugin it works perfectly and I get the corresponding AST:

AST

But when i try to parse my test file programatically I get the error...

line 1:8 mismatched input 'Foobar' expecting LITERAL

I tried to reorder my lexer rules but none of my attempts helped me getting rid of the error. Does someone know how to solve this?

My code for parsing my file looks like this:

String input = FileUtils.readFileToString(new File("test.txt"), Charsets.UTF_8);
CodePointCharStream inputStream = CharStreams.fromString(input);
SpamkillerLexer lexer = new SpamkillerLexer(inputStream);
CommonTokenStream commonTokenStream = new CommonTokenStream(lexer);
SpamkillerParser parser = new SpamkillerParser(commonTokenStream);
SpamkillerParser.MailboxesContext mailboxes = parser.mailboxes();
1

1 Answers

2
votes

You did not specify in your parser which token vocabulary has to be used. So ANTLR creates implicit tokens (which are actually simple terminals) for your parser instead of using ones in the lexer.

To fix this provide the tokenVocab option:

parser grammar SpamkillerParser;

options {
   tokenVocab=SpamkillerLexer;
}

mailboxes: mailbox+;
mailbox: MAILBOX LITERAL BR_OP settings BR_CL;

settings: setting+;
setting: (key EQ STRING | ACTIONS EQ actions);
key: MAILBOX | PASSWORD | HOST | USER | FOLDER | PORT;

actions: BR_OP action* BR_CL;
action: WHEN condition THEN job;
condition: (SUBJECT | BODY) (EQUALS | CONTAINS) STRING;
job: (DELETE | (REDIRECT TO STRING));