How to print parser tree in Yacc (BISON)

Question

I made a parser for the C- language using BISON and FlEX. It works and prints "syntax error" in terminal if given c- input code is syntactically wrong, otherwise print nothing.

But i want to print the parser tree relevant to given c- input code as the output of my parser. How do i do that? Is there function in BISON which can be used to print the parser tree?

Do you mean you want to print the AST for an expression you parse? If so, you'll need to implement that yourself -- I'm reasonably certain neither yacc nor Bison has anything to do it for you. — Jerry Coffin

Kaz Kaz · Accepted Answer · 2012-03-13T20:52:55

The TXR language (http://www.nongnu.org/txr) uses Flex and Yacc for parsing its input. You can see the parse tree if you give it the -v option.

E.g.:

$ ./txr -v -c "@/[a-z]*|foo/"
spec:
(((text (#<sys:regex: 9d99268> or (0+ (set (#\a . #\z))) (compound #\f #\o #\o)))))

You construct the tree in the parser actions and print it yourself with a tree-printing routine. I used a Lisp-like object representation to make life easier. Writing this out is handled by a recursive printing function which recognizes all the possible object types and renders them into notation. For instance above you see objects of character type printed with a hash-backslash notation, and the unprintable, opaque, compiled regex is printed using the notation #< ... >.

Here is a part of the grammar:

regexpr : regbranch                     { $$ = if3(cdr($1), 
                                                   cons(compound_s, $1),
                                                   car($1)); }
        | regexpr '|' regexpr           { $$ = list(or_s, $1, $3, nao); }
        | regexpr '&' regexpr           { $$ = list(and_s, $1, $3, nao); }
        | '~' regexpr                   { $$ = list(compl_s, $2, nao); }
        | /* empty */ %prec LOW         { $$ = nil; }
        ;

As you can see, constructing the AST is largely just simple construction of nested lists. This form is very convenient to compile. The top-level function of the NFA-based regex compiler is very readable:

/*
 * Input is the items from a regex form,
 * not including the regex symbol.
 * I.e.  (rest '(regex ...)) not '(regex ...).
 */
static nfa_t nfa_compile_regex(val exp)
{
  if (nullp(exp)) {
    nfa_state_t *acc = nfa_state_accept();
    nfa_state_t *s = nfa_state_empty(acc, 0);
    return nfa_make(s, acc);
  } else if (typeof(exp) == chr_s) {
    nfa_state_t *acc = nfa_state_accept();
    nfa_state_t *s = nfa_state_single(acc, c_chr(exp));
    return nfa_make(s, acc);
  } else if (exp == wild_s) {
    nfa_state_t *acc = nfa_state_accept();
    nfa_state_t *s = nfa_state_wild(acc);
    return nfa_make(s, acc);
  } else {
    val sym = first(exp), args = rest(exp);

    if (sym == set_s) {
      return nfa_compile_set(args, nil);
    } else if (sym == cset_s) {
      return nfa_compile_set(args, t);
    } else if (sym == compound_s) {
      return nfa_compile_list(args);
    } else if (sym == zeroplus_s) {
      nfa_t nfa_arg = nfa_compile_regex(first(args));
      nfa_state_t *acc = nfa_state_accept();
      /* New start state has empty transitions going through
         the inner NFA, or skipping it right to the new acceptance state. */
      nfa_state_t *s = nfa_state_empty(nfa_arg.start, acc);
      /* Convert acceptance state of inner NFA to one which has
         an empty transition back to the start state, and
         an empty transition to the new acceptance state. */
      nfa_state_empty_convert(nfa_arg.accept, nfa_arg.start, acc);
      return nfa_make(s, acc);
    } else if (sym == oneplus_s) {
      /* One-plus case differs from zero-plus in that the new start state
         does not have an empty transition to the acceptance state.
         So the inner NFA must be traversed once. */
      nfa_t nfa_arg = nfa_compile_regex(first(args));
      nfa_state_t *acc = nfa_state_accept();
      nfa_state_t *s = nfa_state_empty(nfa_arg.start, 0); /* <-- diff */
      nfa_state_empty_convert(nfa_arg.accept, nfa_arg.start, acc);
      return nfa_make(s, acc);
    } else if (sym == optional_s) {
      /* In this case, we can keep the acceptance state of the inner
         NFA as the acceptance state of the new NFA. We simply add
         a new start state which can short-circuit to it via an empty
         transition.  */
      nfa_t nfa_arg = nfa_compile_regex(first(args));
      nfa_state_t *s = nfa_state_empty(nfa_arg.start, nfa_arg.accept);
      return nfa_make(s, nfa_arg.accept);
    } else if (sym == or_s) {
      /* Simple: make a new start and acceptance state, which form
         the ends of a spindle that goes through two branches. */
      nfa_t nfa_first = nfa_compile_regex(first(args));
      nfa_t nfa_second = nfa_compile_regex(second(args));
      nfa_state_t *acc = nfa_state_accept();
      /* New state s has empty transitions into each inner NFA. */
      nfa_state_t *s = nfa_state_empty(nfa_first.start, nfa_second.start);
      /* Acceptance state of each inner NFA converted to empty
         transition to new combined acceptance state. */
      nfa_state_empty_convert(nfa_first.accept, acc, 0);
      nfa_state_empty_convert(nfa_second.accept, acc, 0);
      return nfa_make(s, acc);
    } else {
      internal_error("bad operator in regex");
    }
  }
}

How to print parser tree in Yacc (BISON)

1 Answers