1
votes

I am trying to create a parser using flex/bison for a verilog-like language (or more specifically verilog with some extra constructs, that is used to generate verilog code) and I am having trouble figuring out how to construct my bison rules to handle a general type of constructs that are available in the language.

A general example of this construct is this:

(
output wire          up_o,
input  wire [4:0]    up_i,
///////////////////////////////////////////////////////////
// A comment
///////////////////////////////////////////////////////////
`my_if (`$a eq `$b)
  input  wire        a_{n}_clk_i,
  output wire        a_{n}_rst_o,
`my_endif
output wire                                           out_o
);

In principle the my_if can be nested and within the ( ); everything can be within a my_if block, without any my_if. This leads me to believe that my current idea is flawed and that i simply do not understand how to do this mutual recursion.

What seems to causing me problems is the , that is terminating every line within the block except for the final line. My lexer ignores comments and tokenize the lines fine, I also parse the my_if line specially.

Below is the relevant bison rules I have now,

portdecla:
'(' ports ')' ';'
;

ports:
port
| ports ',' port
| ports ',' ifs
| ifs
;

ifs:
ifhead ports TENDIF
| if
;

The ifhead and port parse the my_if and input/output lines fines.

Any ideas on how to create these rules would be very much appreciated

1

1 Answers

3
votes

I had a bit of trouble understanding your question. It would have been much much easier if you had described the problem, rather than just vaguely asserting that it might have something to do with commas. Of course, it does have to do with commas: your grammar insists that there be a comma after ifs as well as after port. (Also, either you didn't copy-and-paste correctly or there is something really weird about ifs.)

Actually, you don't need ifs, but you need to distinguish between an if at the end of a declaration, and any other if:

port_declaration: tail_ports;

ports:  port ','
     |  port ',' ports
     |  if
     |  if ports
     ;
tail_ports
     :  port
     |  tail_if
     |  port ',' tail_ports
     |  if tail_ports
     ;
if   :  IF '(' condition ')' ports END_IF
     ;
tail_if
     :  IF '(' condition ')' tail_ports END_IF
     ;

(I wrote that right-recursively since it's easier to understand, IMHO, and it's also easier to build the AST out of right-recursive lists. See Levine, p. 66. On modern machines, there is really no problem with the parse stack growing; the default maximum stack depth is 10,000; and you can increase it if that's not enough.)

Edit: I think I finally got the above right.

This sort of problem is usually solved by parsing in two passes; first a pre-processing pass, whose output is fed into the real parser. Tokenizing is only done once; the preprocessing parser doesn't differentiate between language tokens; it only cares about it's own tokens. (The C preprocessor can fuse tokens with the ## operator but it never splits a token; that generally turns out to be a reasonable approach.) This can be implemented with bison, although getting the dataflow right is a bit of a challenge; I personally prefer the lemon approach where the lexer pushes tokens to the parser, since that is a lot easier to extend.

In your particular case, though, you want the preprocessor to only accept complete phrases (port) so the challenge, as you say, is getting the ,s right, even when the last port is buried deep inside an if construction. Hence the above approach which differentiates between if and tail_if.

There are several approaches to building an AST, depending on whether you can evaluate the conditions at parse-time or not. If you can, then you can just leave the unneeded ports out completely, which is likely the simplest solution.

If you need to do the entire parse before applying the conditions, then you still have a couple of options: the obvious one of building a tree whose nodes are either port or conditional_list, or the code-generation style in which you effectively build a vector of operations build-port and jump-if-condition-false. Both of these options require a tagged union of some form.

/Edit (The following is still incorrect.)

typedef struct PortList {
  Port* port;
  struct PortList* next;
} PortList;

typedef struct ConditionalPortList {
  Condition* cond;  /* Optional condition; if absent, unconditional */
  PortList*  ports;
} ConditionalPortList;

typedef struct PortDeclaration {
  ConditionalPortList* clause;
  struct PortDeclaration *next;
} PortDeclaration;

That's easy to build (and it might be even easier than this snippet, I'm a bit out of practice.) (Warning: untried):

%union {
   PortList*            port_list;
   ConditionalPortList* conditional;
   PortDeclaration*     port_declaration;
   /* ... */
}
%type <port_list> ports
%type <conditional> if always
%type <port_declaration> port_tail port_declaration
/* ... */

%%

ports:  port             { $$ = NewPortList($1, NULL); }
     |  port ',' ports   { $$ = NewPortList($1, $3); }
     ;

if: IF '(' condition ')' ports ENDIF
                         { $$ = NewConditionalPortList($3, $5); }
  ;

always: ports            { $$ = NewConditionalPortList(NULL, $1); }
      ;

port_tail: if
         | if port_tail  { $$ = NewPortDeclaration($1, $2); }
         | if always port_tail
                         { $$ = NewPortDeclaration($1, NewPortDeclaration($2, $3); }
         ;

port_declaration:
           '(' always ')' ';'    { $$ = $2; }
         | '(' port_tail ')' ';' { $$ = $2; }
         | '(' always port_tail ')' ';'
                                 { $$ = NewPortDeclaration($2, $3); }
         ;