0
votes

OK, after posting a simple example here: Ambiguous ANTLR parser rule

I think that over-simplyfing the example didn't work well for me. So, I'm adding now the real example.

Here is the text to be parsed:

#ifndef _EVENTS_H
#define _EVENTS_H
#define EVENTS_LOGGER_VER 3.0f
/******************************************************************************************************
<Start of event definitions section - Do not edit this comment.
******************************************************************************************************/
#define EVT_FLOW_HW_ASSERTION_BASE              0x0     // Hw assertion base event
#define EVT_FLOW_HW_ASSERTION_PMG               0x1    // Hw assertion detected on PMG module. Module = 0x%x. Status is 0x%x.
#define EVT_I2C_SECTION_START                   0x20
#define EVT_I2C_DRIVER_ERROR                    0x26    //  I2C driver returns with error 0x%x on Device 0x%x Offset 0x%x
#define EVT_I2C_TARGET_DEVICE_ERROR             0x27    //  I2C interrupt on error: Status=0x%x%x
#define EVT_TIME_MEASUREMENTS                   0x2A    // Time measurement. Line : %d; Spare : %d; Time, us : %d
#define EVT_DFU_AFTER_UPDATE_STATE_REG          0x2D    // Going to DFU  (REG_RESET_STATUS = %x)
#define EVT_MNOT_SAFE_DEBUG_INFO_CTL_RO_P1      0xC3    // ctl ro data 0x99-0xa0: 0x%x 0x%x 0x%x

and here is the grammar:

grammar EventsHFile;

/*
 * Parser Rules
 */

prog : ifndefEvents defineEvents defineVersion event+ EOF;

ifndefEvents : IFNDEF '_EVENTS_H';

defineEvents : DEFINE '_EVENTS_H';

defineVersion: DEFINE 'EVENTS_LOGGER_VER' version=versionRule 'f';

versionRule:   REAL   ;

event : DEFINE EVT_HEADER eventName=eventNameRule HEX eventId=eventIdRule (COMMENT_HEADER commentRule)?;

eventNameRule : ID;

eventIdRule : HEX_VALUE;

commentRule: (ID | hexArgumentRule | decimalArgumentRule | numericArgumentRule | HEX | HEX_VALUE | COMMENTCHAR)+;

numericArgumentRule : '%x';

hexArgumentRule : HEX numericArgumentRule+;

decimalArgumentRule : '%d';

IFNDEF : '#ifndef';
DEFINE : '#define';

EVT_HEADER : 'EVT_';

COMMENT_HEADER : '//';

fragment
DIGIT                   :           [0-9];

fragment
LETTER                  :           [a-zA-Z];

fragment
UNDERSCORE              :           '_';

fragment
HEXADIGIT
    :   [0-9a-fA-F]
    ;

ID : LETTER (LETTER|DIGIT|UNDERSCORE)* ;

REAL : DIGIT+ '.' DIGIT+;

HEX : '0' [xX] ;

HEX_VALUE : HEXADIGIT HEXADIGIT* ;

COMMENTCHAR : ('(' | ')' | '=' | '-' | ':');

BLOCKCOMMENT :   '/*' .*? '*/' -> channel(HIDDEN);
WS     :   (' ' | '\r' | '\n' | '\t') -> channel(HIDDEN);

Now, the problem is, as explained in the previous post, that the eventNameRule is probably ambiguous, and it catches the 'EVT_' prefix, resulting in the following tree (I'm adding one event tree, all events look the same): enter image description here

As usual, any help is appreciated.

Thanks, Busi

1

1 Answers

0
votes

The lexer operates independently of the parser. The lexer will match the rule which matches the most characters. For EVT_FLOW_HW_ASSERTION_BASE this is ID.

You could define a seperate lexer rule for event names:

EVT_ID : 'EVT_' ID;

Put it before ID so it get matched, as the lexer will choose the first rule if multiple rules match with the same length (EVT_ID and ID) in this case.

EDIT: You need to change your event rule accordingly:

grammar EventsHFile;

/*
 * Parser Rules
 */

prog : ifndefEvents defineEvents defineVersion event+ EOF;

ifndefEvents : IFNDEF '_EVENTS_H';

defineEvents : DEFINE '_EVENTS_H';

defineVersion: DEFINE 'EVENTS_LOGGER_VER' version=versionRule 'f';

versionRule:   REAL   ;

event : DEFINE eventName=eventNameRule HEX eventId=eventIdRule (COMMENT_HEADER commentRule)?;

eventNameRule : EVT_ID;

eventIdRule : HEX_VALUE;

commentRule: (ID | hexArgumentRule | decimalArgumentRule | numericArgumentRule | HEX | HEX_VALUE | COMMENTCHAR)+;

numericArgumentRule : '%x';

hexArgumentRule : HEX numericArgumentRule+;

decimalArgumentRule : '%d';

IFNDEF : '#ifndef';
DEFINE : '#define';

EVT_ID : EVT_HEADER ID;
EVT_HEADER: 'EVT_';
COMMENT_HEADER : '//';

fragment
DIGIT                   :           [0-9];

fragment
LETTER                  :           [a-zA-Z];

fragment
UNDERSCORE              :           '_';

fragment
HEXADIGIT
    :   [0-9a-fA-F]
    ;

ID : LETTER (LETTER|DIGIT|UNDERSCORE)* ;

REAL : DIGIT+ '.' DIGIT+;

HEX : '0' [xX] ;

HEX_VALUE : HEXADIGIT HEXADIGIT* ;

COMMENTCHAR : ('(' | ')' | '=' | '-' | ':');

BLOCKCOMMENT :   '/*' .*? '*/' -> channel(HIDDEN);
WS     :   (' ' | '\r' | '\n' | '\t') -> channel(HIDDEN);

Note that your single line comment rule is also wrong, as it doesn't match . characters. I'll leave that one to you, there should be plenty of examples.