0
votes

I have a question regarding my parser I'm writing in C. It's suppose to use left-recursion (gives left associativity for operators) and then return the expression in postfix notation. Specifically for my Expr and Term rules. Currently I'm running into a problem where I believe this is not occurring. Here's my grammar:

extern int yylex(); /* The next token function. */
extern char *yytext; /* The matched token text. */
extern int yyleng; /* The token text length. */
void yyerror(char *s);
#define YYSTYPE long /* 64 bit so can hold pointer and int */
%}

// These token definitions populate y.tab.h
// Single character tokens are their own token numbers (e.g. scanner returns
// the value ';' for the semicolon token)
%token INT_TOK 1
%token CHR_TOK 2
%token ASSIGN_TOK 3
%token INTLIT_TOK 4
%token IDENT_TOK 5

%%
Prog    : IDENT_TOK '{' StmtSeq '}'    { $$ = (long) strdup(yytext); }  ;
StmtSeq : Stmt ';' StmtSeq                                                        ;
StmtSeq :                                                                         ;

Assign  : LHS ASSIGN_TOK Expr         { printf("%s =\n",(char *)$1); } ;
LHS     : IDENT_TOK                   { $$ = (long) strdup(yytext);  } ;

Stmt    : Decl;
Stmt    : Assign;
Decl    : Type IDLst;
Type    : INT_TOK;
Type    : CHR_TOK;
IDLst   : IDENT_TOK MLst;
MLst    : ',' IDLst;
MLst    : ;
Expr    : Term MExpr;
MExpr   : AddOp Term MExpr            { printf("%s ",(char *)$1); } ;
MExpr   : ;
Term    : Factor MTerm;
MTerm   : MultOp Factor MTerm         { printf("%s ",(char *)$1); } ;
MTerm   : ;
Factor  :  '(' Expr ')';
Factor  :  '-' Factor;
Factor  : INTLIT_TOK                  { printf("%s ",yytext); }
Factor  : IDENT_TOK                   { printf("%s ",yytext); }
AddOp   : '-'                                               { $$ = (long) strdup(yytext); }  ;
AddOp   : '+'                                               { $$ = (long) strdup(yytext); }  ;
MultOp  : '*'                                               { $$ = (long) strdup(yytext); }  ;
MultOp  : '/'                                               { $$ = (long) strdup(yytext); }  ;
%%

The test file I'm using is this:

test1 {
  int amt, box;
  int x, y, w;
  x := 4 - 2 - 1;             // 4 2 - 1 - x =
  amt := 5 * y - 2;           // 5 y * 2 - amt =
  x := 5 * (y - 2);           // 5 y 2 - * x =
  box := 5 * x / amt + 3 * 4; // 5 x * amt / 3 4 * + box =
  y := z; w:= 1;              // z y =    1 w =
  }

The commented expressions denote the output I should get. Thus my grammar should return,

  1. x := 4 - 2 - 1; should produce 4 2 - 1 - x =
  2. amt := 5 * y - 2; should produce 5 y * 2 - amt =
  3. x := 5 * (y - 2); should produce 5 y 2 - * x =
  4.box := 5 * x / amt + 3 * 4; should produce 5 x * amt / 3 4 * + box =
  5. y := z; w:= 1;   should produce z y =    1 w =

My grammar is returning,

1. 4 2 1 - - x =
2. I get the correct output
3. I get the correct output
4. I get the correct output
5. 5 x amt / * 3 4 * + box =

From my understanding, it seems as if my operators are not left associative. Anyone know why this would be?

1
That grammar is not left-recursive. Left-recursive would be expr: term | expr AddOp term.rici
Also, it is almost always wrong to refer to yytext in a yacc action. And if you want to allow two possible semantic types, useca semantic union.rici
In addition to what rici said: It's by no means guaranteed that long has 64 bits on 64-bit platforms (for example, sizeof(long) is 4 on both 32-bit and 64-bit Windows), so it's not safe to store a pointer in a long.sepp2k
If you want a integer type that's large enough to hold a pointer, you want intptr_t or uintptr_t. But in this case, you really want a %union as rici said.sepp2k
By the way, LALR(1) parser generation like in Yacc is in fact well suited to left-recursion; left recursion runs in a constant Yacc-stack depth regardless of the depth of the recursion. This is because the resulting state machine intersperses shift and reduce actions. Right recursion gives Yacc parsers indigestion; partial derivations pile up on the Yacc stack which is then subject to a cascade of reductions when the end of the expression is reached.Kaz

1 Answers

1
votes

A left-recursive production is a production in which the non-terminal being produced is the first (leftmost) symbol on the right-hand side. For example,

MTerm: MTerm MultOp Factor

A right-recursive production is one in which the non-terminal being produced is the last (rightmost) symbol on the right-hand side. For example,

MTerm: MultOp Factor MTerm

Your grammar has no left-recursive rules, and many right-recursive rules. So it is somewhat unsurprising that it does not produce left associativity.

The grammar appears to be the result of an attempt to remove left-recursion in order to produce an LL grammar, although you appear to be using yacc/bison and the assignment assumes that left-recursion is possible, both of which suggest that you require an LR grammar. LR grammars require neither left-factoring nor left-recursion elimination.