0
votes

I'm writing a "compiler" of sorts: it reads a description of a game (with rooms, characters, things, etc.) Think of it as a visual version of an Adventure-style game, but with much simpler problems.

When I run my "compiler" I'm getting a syntax error on my input, and I can't figure out why. Here's the relevant section of my yacc input:

character
    : char-head general-text character-insides { PopChoices(); }
    ;
character-insides
    : LEFTBRACKET options RIGHTBRACKET
    ;
char-head
    : char-namesWT opt-imgsWT char-desc opt-cond
    ;
char-desc
    : general-text { SetText($1); }
    ;
char-namesWT
    : DOTC ID WORD { AddCharacter($3, $2); expect(EXP_TEXT); }
    ;
opt-cond
    : %empty
    | condition
    ;
condition
    : condition-reason condition-main general-text
        { AddCondition($1, $2, $3); }
    ;
condition-reason
    : DOTU { $$ = 'u'; }
    | DOTV { $$ = 'v'; }
    ;
condition-main
    : money-conditionWT
    | have-conditionWT
    | moves-conditionWT
    | flag-conditionWT
    ;
have-conditionWT
    : PERCENT_SLASH opt-bang ID
        { $$ = MkCondID($1, $2, $3) ; expect(EXP_TEXT); }
    ;
opt-bang
    : %empty { $$ = TRUE; }
    | BANG { $$ = FALSE; }
    ;
ID: WORD
    ;

Things in all caps are terminal symbols, things in lower or mixed case are non-terminals. If a non-terminal ends in WT, then it "wants text". That is, it expects that what comes after it may be arbitrary text.

Background: I have written my own token recognizer in C++ because(*) I want the syntax to be able to change the way the lexer's behavior. Two types of tokens should be matched only when the syntax expects them: FILENAME (with slashes and other non-alphameric characters) and TEXT, which means "all the text from here to the end of the line" (but not starting with certain keywords).

The function "expect" tells the lexer when to look for these two symbols. The expectation is reset to EXP_NORMAL after each token is returned.

I have added code to yylex that prints out the tokens as it recognizes them, and it looks to me like the tokenizer is working properly -- returning the tokens I expect.

(*) Also because I want to be able to ask the tokenizer for the column where the error occurred, and get the contents of the line being scanned at the time so I can print out a more useful error message.

Here is the relevant part of the input:

.c Wendy wendy
    OK, now you caught me, what do you want to do with me?
  .u %/lasso    You won't catch me like that.
  [

Here is the last part of the debugging output from yylex:

token: 262: DOTC/
token: 289: WORD/Wendy
token: 289: WORD/wendy
token: 292: TEXT/OK, now you caught me, what do you want to do with me?
token: 286: DOTU/
token: 274: PERCENT_SLASH/%/
token: 289: WORD/lasso
token: 292: TEXT/You won't catch me like that.
token: 269: LEFTBRACKET/

here's my error message: : line 124, columns 3-4: syntax error, unexpected LEFTBRACKET, expecting TEXT [

To help you understand the equations above, here is the relevant part of the description of the input syntax that I wrote the yacc code from.

// Character:
//      .c id charactername,[imagename,[animationname]]
//        description-text
//        .u    condition on the character being usable [optional]
//        .v    condition on the character being visible [optional]
//      [
//        (options)
//      ]
// Conditions:
//        %$[-]n        Must [not] have at least n dollars
//        %/[-]name     Must [not] have named thing
//        %t-nnn        At/before specified number of moves
//        %t+nnn        At/after specified number of moves
//        %@[-]name     named flag must [not] be set
// Condition-char: $, /, t, or @, as described above
//
// Condition:
//        % condition-char (identifier/int) ['/' text-if-fail ]
// description-text: Can be either on-line text or multi-line text
//      On-line text is the rest of the line

brackets mark optional non-terminals, but a bracket standing alone (represented by LEFTBRACKET and RIGHTBRACKET in the yacc) is an actual token, e.g. // [ // (options) // ] above.

What am I doing wrong?

2
Turn on YYDEBUG so you can see what state you're in when the error occurs. Best guess is its having problems finding the end of general-text which you don't show. - Chris Dodd

2 Answers

1
votes

To debug parsing problems in your grammar, you need to understand the shift/reduce machine that yacc/bison produces (described in the .output file produced with the -v option), and you need to look at the trail of states that the parser goes through to reach the problem you see.

To enable debugging code in the parser (which can print the states and the shift and reduce actions as they occur), you need to compile with -DYYDEBUG or put #define YYDEBUG 1 in the top of your grammar file. The debugging code is controlled by the global variable yydebug -- set to non-zero to turn on the trace and zero to turn it off. I often use the following in main:

#ifdef YYDEBUG
    extern int yydebug;
    if (char *p = getenv("YYDEBUG"))
        yydebug = atoi(p);
#endif

Then you can include -DYYDEBUG in your compiler flags for debug builds and turn on the debugging code by something like setenv YYDEBUG 1 to set the envvar prior to running your program.

0
votes

I suppose your syntax error message was generated by bison. What is striking is that it claims to have found a LEFTBRACKET when it expects a [. Naively, you might expect it to be satisfied with the LEFTBRACKET it found, but of course bison knows nothing about LEFTBRACKET except its numeric value, which will be some integer larger than 256.

The only reason bison might expect [ is if your grammar includes the terminal '['. But since your scanner seems to return LEFTBRACKET when it sees a [, the parser will never see '['.