I am in the process of finalizing a grammar for a proprietary pattern language. It borrows a few regex syntax elements (like quantifiers) but it's also a lot more complex than regex, since it has to allow macros, different pattern styles etc.
My problem is that '*' does not match against the ID lexer rule like it's supposed to. There is no other rule that could swallow the * token as far as i see.
Here's the grammar i wrote:
grammar Pattern;
element:
ID
| macro;
macro:
MACRONAME macroarg? ('*'|'+'|'?'|FROMTIL)?;
macroarg: '['( (element | MACROFREE ) ';')* (element | MACROFREE) ']';
and_con :
element '&' element
| and_con '&' element
|'(' and_con ')';
head_con :
'H[' block '=>' block ']';
expression :
element
| and_con
| expression ' ' element
| '(' expression ')';
block :
element
| and_con
| or_con
| '(' block ')';
blocksequence :
(block ' '+)* block;
or_con :
((element | and_con) '|')+ (element | and_con)
| or_con '|' (element | and_con)
| '(' blocksequence (')|(' blocksequence)+ (')'|')*');
patternlist :
(blocksequence ' '* ',' ' '*)* blocksequence;
sentenceord :
'S=(' patternlist ')';
sentenceunord :
'S={' patternlist '}';
pattern :
sentenceord
| sentenceunord
| blocksequence;
multisentence :
MS pattern;
clause :
'CLS' ' '+ pattern;
complexpattern :
pattern
| multisentence
| clause
| SECTIONS ' ' complexpattern;
dictentry:
NUM ';' complexpattern
| NUM ';' NAME ';' complexpattern
| COMMENT;
dictionary:
(dictentry ('\r'|'\n'))* (dictentry)?;
ID : '*' ('*'|'+'|'?'|FROMTIL)?
| ( '^'? '!'? ('F'|'C'|'L'|'P'|'CA'|'N'|'PE'|'G'|'CD'|'T'|'M'|'D')'=' NAME ('*'|'+'|'?'|FROMTIL)? '$'? );
MS : 'MS' [0-9];
SECTIONS: 'SEC' '=' ([0-9]+','?)+;
FROMTIL: '{'NUM'-'NUM'}';
NUM: [0-9]+;
NAME: CHAR+ | ',' | '.' | '*';
CHAR: [a-zA-Z0-9_äöüßÄÖÜ\-];
MACRONAME: '#'[a-zA-Z_][a-zA-Z_0-9]*;
MACROFREE: [a-zA-Z!]+;
COMMENT: '//' ~('\r'|'\n')*;
The complexpattern/pattern/element/block parser rules should accept a simple '*', and i can't figure out why they don't.