0
votes

I am in the process of finalizing a grammar for a proprietary pattern language. It borrows a few regex syntax elements (like quantifiers) but it's also a lot more complex than regex, since it has to allow macros, different pattern styles etc.

My problem is that '*' does not match against the ID lexer rule like it's supposed to. There is no other rule that could swallow the * token as far as i see.

Here's the grammar i wrote:

grammar Pattern;

element:
        ID
        | macro;

macro:
        MACRONAME macroarg? ('*'|'+'|'?'|FROMTIL)?;

macroarg: '['( (element | MACROFREE ) ';')* (element | MACROFREE) ']';


and_con :
        element '&' element
        | and_con '&' element
        |'(' and_con ')';

head_con :
        'H[' block '=>' block ']';

expression :
        element
        | and_con
        | expression ' ' element
        | '(' expression ')';

block :
        element
        | and_con
        | or_con
        | '(' block ')';

blocksequence :
        (block ' '+)* block;

or_con :
         ((element | and_con) '|')+ (element | and_con)
        | or_con '|' (element | and_con)
        | '(' blocksequence (')|(' blocksequence)+ (')'|')*');

patternlist :
        (blocksequence ' '* ',' ' '*)* blocksequence;

sentenceord :
        'S=(' patternlist ')';

sentenceunord :
        'S={' patternlist '}';

pattern :
        sentenceord
        | sentenceunord
        |  blocksequence;      

multisentence :
        MS pattern;

clause :
        'CLS' ' '+ pattern;

complexpattern :
        pattern
        | multisentence
        | clause
        | SECTIONS ' ' complexpattern;

dictentry:
        NUM ';' complexpattern
        | NUM ';' NAME ';' complexpattern
        | COMMENT;

dictionary:
        (dictentry ('\r'|'\n'))* (dictentry)?;

ID : '*' ('*'|'+'|'?'|FROMTIL)?
        | ( '^'? '!'? ('F'|'C'|'L'|'P'|'CA'|'N'|'PE'|'G'|'CD'|'T'|'M'|'D')'=' NAME ('*'|'+'|'?'|FROMTIL)? '$'? );

MS : 'MS' [0-9];

SECTIONS: 'SEC' '=' ([0-9]+','?)+;

FROMTIL: '{'NUM'-'NUM'}';

NUM: [0-9]+;

NAME: CHAR+ | ',' | '.' | '*';

CHAR: [a-zA-Z0-9_äöüßÄÖÜ\-];

MACRONAME: '#'[a-zA-Z_][a-zA-Z_0-9]*;

MACROFREE: [a-zA-Z!]+;

COMMENT: '//' ~('\r'|'\n')*;

The complexpattern/pattern/element/block parser rules should accept a simple '*', and i can't figure out why they don't.

1

1 Answers

0
votes

In your macro rule, you defined the literal '*', causing the ID rule not to match a single "*" as input.