1
votes

I am using a self-defined function(assertFailsWith) within a function body, but getting ANTLR parser error in line "assertFailsWith(IllegalArgumentException::class) {" :
mismatched input '{' expecting {NL, '}', I am using Kotlin grammar from https://github.com/antlr/grammars-v4/tree/master/kotlin/kotlin

Do I need any changes in the below part to remove the error:

functionBody
    : block
    | ASSIGNMENT NL* expression
    ;

block
     : LCURL statements RCURL
   ;



@Test
    fun `Create invalid test`() {
        assertFailsWith(IllegalArgumentException::class) {
            // Variables
            val realVocabPath = "realVocabPath"
            val realAlphabetPath = "realAlphabetPath"
            val vocabFactory = VocabFactory(mockFileLoader, 0.6f)

            // Execute
            val vocab = vocabFactory.create(realVocabPath, realAlphabetPath, mockEngineSpec)

            // Verify
            assertEquals(mockWordPieceVocab, vocab)
        }
    }
1

1 Answers

2
votes

It's a bug in the lexer grammar. Because of this, the parser trips up and cannot recover from the unexpected token stream. The error is not because of any nested functions (or function calls).

If you let the input .6f 0.6f 1.6f be tokenenised, you will see that the lexer produces these tokens:

RealLiteral               `.6f`
IntegerLiteral            `0`
RealLiteral               `.6f`
RealLiteral               `1.6f`

As you can see, the input 0.6f is not recognized as a RealLiteral token. You can verify this by changing 0.6f into 1.6f and your parser will not produce any errors.

To fix this, change:

DoubleLiteral
    : ( (DecDigitNoZero DecDigit*)? '.'
      | (DecDigitNoZero (DecDigit | '_')* DecDigit)? '.')
     ( DecDigit+
      | DecDigit (DecDigit | '_')+ DecDigit
      | DecDigit+ [eE] ('+' | '-')? DecDigit+
      | DecDigit+ [eE] ('+' | '-')? DecDigit (DecDigit | '_')+ DecDigit
      | DecDigit (DecDigit | '_')+ DecDigit [eE] ('+' | '-')? DecDigit+
      | DecDigit (DecDigit | '_')+ DecDigit [eE] ('+' | '-')? DecDigit (DecDigit | '_')+ DecDigit
     )
    ;

into:

DoubleLiteral
    : ( (DecDigitNoZero DecDigit* | '0')? '.'
      | (DecDigitNoZero (DecDigit | '_')* DecDigit)? '.')
     ( DecDigit+
      | DecDigit (DecDigit | '_')+ DecDigit
      | DecDigit+ [eE] ('+' | '-')? DecDigit+
      | DecDigit+ [eE] ('+' | '-')? DecDigit (DecDigit | '_')+ DecDigit
      | DecDigit (DecDigit | '_')+ DecDigit [eE] ('+' | '-')? DecDigit+
      | DecDigit (DecDigit | '_')+ DecDigit [eE] ('+' | '-')? DecDigit (DecDigit | '_')+ DecDigit
     )
    ;

and your parser will be able to properly parse your input.

Submitted a fix for it here: https://github.com/antlr/grammars-v4/pull/1850