0
votes

I am using antlr 3.1.3 and generating a python target. My lexer and parser accept very large files. Based on command-line or dynamic run-time controlled parameters, I would like to capture a portion of the recognized input and stop parsing early. For example, if my language consists of a header and a body, and the body might have gigabytes of tokens, and I am only interested in the header, I would like to have a rule that stops the lexer and parser without raising an exception. For performance reasons, I don't want to read the entire body.

grammar Example;

options {
  language=Python;
  k=2;
}

language:
    header
    body
    EOF
    ;

header:
    HEAD
    (STRING)*
    ;

body:
    BODY { if stopearly: help() }
    (STRING)*
    ;

// string literals
STRING: '"'
    (   
        '"' '"'
    |   NEWLINE
    |   ~('"'|'\n'|'\r')
    )*
    '"'
    ;

// Whitespace -- ignored
WS:
    (   ' '
    |   '\t'
    |   '\f'
    |   NEWLINE
    )+ { $channel=HIDDEN }
    ;

HEAD: 'head';
BODY: 'body';
fragment NEWLINE: '\r' '\n' | '\r' | '\n';
2

2 Answers

0
votes

What about:

body:
    BODY {!stopearly}? => (STRING)*
;

?

That's using a syntantic predicate to enable certain language parts. I use that often to toggle language parts depending on a version number. I'm not 100% certain. It might be you have to move the predicate and the code following it into an own rule.

0
votes

This is a python-specific answer. I Added this to my parser:

@parser::header
    {
    class QuitEarlyException(Exception):
        def __init__(self, value):
            self.value = value
        def __str__(self):
            return repr(self.value)
    }

and changed this:

body:
    BODY { if stopearly: raise QuitEarlyException('ok') }
    (STRING)*
    ;

Now I have a "try" block around my parser:

try:
    parser.language()
except QuitEarlyException as e:
    print "stopped early"