1
votes

I'm trying to parse some messages using an ANTLR grammar. Messages carry the structure:

:20:REF123456
:72:Some narrative text which
may contain new lines and
occassionally other : characters
:80A:Another field

The target output is a table containing the text between the colons as a 'key' and the text until the next key as the value of that key. For example:

Key | Values
--------------------------------------
20  | REF123456
72  | Some narrative text which
      may contain new lines and
      occassionally other : characters
80  | Another field

I'm able to write a grammar to do this, as long as colons are not allowed in the value field based on the following reference http://danielveselka.blogspot.fr/2011/02/antlr-swift-fields-parser.html

Can anyone offer guidance on how to approach this problem?

1

1 Answers

2
votes

I'd skip v3 and go with ANTLR v4. A quick demo of how to do this in v4 would look like this:

grammar Swift;

parse
 : entries? EOF
 ;

entries
 : entry ( LINE_BREAK entry )* 
 ;

entry
 : key value
 ;

key
 : ':' DATA ':'
 ;

value
 : line ( LINE_BREAK line )*
 ;

line
 : ( DATA | SPACES ) ( COLON | DATA | SPACES )*
 ;

LINE_BREAK
 : '\r'? '\n'
 | '\r'
 ;

COLON
 : ':'
 ;

DATA
 : ~[\r\n: \t]+
 ;

SPACES
 : [ \t]+
 ;

Now all you need to do now is attach a listener to a tree-walker and listen for enterEntry occurrences and capture the key and value text. Here's how to do that:

public class Main {

    public static void main(String[] args) throws Exception {

        String input = ":20:REF123456\n" +
                ":72:Some narrative text which\n" +
                "may contain new lines and\n" +
                "occassionally other : characters\n" +
                ":80A:Another field";

        SwiftLexer lexer = new SwiftLexer(new ANTLRInputStream(input));
        SwiftParser parser = new SwiftParser(new CommonTokenStream(lexer));

        ParseTreeWalker.DEFAULT.walk(new SwiftBaseListener(){
            @Override
            public void enterEntry(@NotNull SwiftParser.EntryContext ctx) {
                String key = ctx.key().getText().replace(":", "");
                String value = ctx.value().getText().replaceAll("\\s+", " ");
                System.out.printf("key   -> %s\nvalue -> %s\n================\n", key, value);
            }
        }, parser.parse());
    }
}

Running the demo above will print the following on your console:

key   -> 20
value -> REF123456
================
key   -> 72
value -> Some narrative text which may contain new lines and occassionally other : characters
================
key   -> 80A
value -> Another field
================