I had a bit of trouble understanding your question. It would have been much much easier if you had described the problem, rather than just vaguely asserting that it might have something to do with commas. Of course, it does have to do with commas: your grammar insists that there be a comma after ifs
as well as after port
. (Also, either you didn't copy-and-paste correctly or there is something really weird about ifs
.)
Actually, you don't need ifs
, but you need to distinguish between an if
at the end of a declaration, and any other if
:
port_declaration: tail_ports;
ports: port ','
| port ',' ports
| if
| if ports
;
tail_ports
: port
| tail_if
| port ',' tail_ports
| if tail_ports
;
if : IF '(' condition ')' ports END_IF
;
tail_if
: IF '(' condition ')' tail_ports END_IF
;
(I wrote that right-recursively since it's easier to understand, IMHO, and it's also easier to build the AST out of right-recursive lists. See Levine, p. 66. On modern machines, there is really no problem with the parse stack growing; the default maximum stack depth is 10,000; and you can increase it if that's not enough.)
Edit: I think I finally got the above right.
This sort of problem is usually solved by parsing in two passes; first a pre-processing pass, whose output is fed into the real parser. Tokenizing is only done once; the preprocessing parser doesn't differentiate between language tokens; it only cares about it's own tokens. (The C preprocessor can fuse tokens with the ##
operator but it never splits a token; that generally turns out to be a reasonable approach.) This can be implemented with bison, although getting the dataflow right is a bit of a challenge; I personally prefer the lemon
approach where the lexer pushes tokens to the parser, since that is a lot easier to extend.
In your particular case, though, you want the preprocessor to only accept complete phrases (port
) so the challenge, as you say, is getting the ,
s right, even when the last port
is buried deep inside an if
construction. Hence the above approach which differentiates between if
and tail_if
.
There are several approaches to building an AST, depending on whether you can evaluate the conditions at parse-time or not. If you can, then you can just leave the unneeded ports out completely, which is likely the simplest solution.
If you need to do the entire parse before applying the conditions, then you still have a couple of options: the obvious one of building a tree whose nodes are either port
or conditional_list
, or the code-generation style in which you effectively build a vector of operations build-port
and jump-if-condition-false
. Both of these options require a tagged union of some form.
/Edit (The following is still incorrect.)
typedef struct PortList {
Port* port;
struct PortList* next;
} PortList;
typedef struct ConditionalPortList {
Condition* cond; /* Optional condition; if absent, unconditional */
PortList* ports;
} ConditionalPortList;
typedef struct PortDeclaration {
ConditionalPortList* clause;
struct PortDeclaration *next;
} PortDeclaration;
That's easy to build (and it might be even easier than this snippet, I'm a bit out of practice.) (Warning: untried):
%union {
PortList* port_list;
ConditionalPortList* conditional;
PortDeclaration* port_declaration;
/* ... */
}
%type <port_list> ports
%type <conditional> if always
%type <port_declaration> port_tail port_declaration
/* ... */
%%
ports: port { $$ = NewPortList($1, NULL); }
| port ',' ports { $$ = NewPortList($1, $3); }
;
if: IF '(' condition ')' ports ENDIF
{ $$ = NewConditionalPortList($3, $5); }
;
always: ports { $$ = NewConditionalPortList(NULL, $1); }
;
port_tail: if
| if port_tail { $$ = NewPortDeclaration($1, $2); }
| if always port_tail
{ $$ = NewPortDeclaration($1, NewPortDeclaration($2, $3); }
;
port_declaration:
'(' always ')' ';' { $$ = $2; }
| '(' port_tail ')' ';' { $$ = $2; }
| '(' always port_tail ')' ';'
{ $$ = NewPortDeclaration($2, $3); }
;