Grammar for language of regular expressions with given alphabet

Question

I'm having a hard time figuring out where to start with this question. The question is to create a grammar for the language of regular expressions with the alphabet { ‘a’, ‘b’, ‘c’, ‘d’, ‘|’, ‘*’, ‘(‘, ‘)’ }. Assume 1 line of input (a single string) and no spaces. Empty string is not valid and it should correctly capture the precedence of the operators which from lowest to highest is union ('|'), concatenation, and kleene ('*'). What would be the best way to start writing the grammar for this? Any input would be greatly appreciated.

Not looking for an answer, just not sure how to start. So far I have something like:

S -> S'('S')'S  
   | A  
A -> AA  
   | A'*'  
   | T  
T -> T'|'T  
   | X  
X -> 'a'  
   | 'b'  
   | 'c'  
   | 'd'

But I'm not sure if I'm even on the right track.

EDIT: After doing some more work this is the conclusion I've reached:

START -> EXPRESSION  
EXPRESSION -> EXPRESSION'|'EXPRESSION  
EXPRESSION -> PARENTHETICAL'*'  
EXPRESSION -> PARENTHETICAL  
EXPRESSION -> EXPRESSION PARENTHETICAL  
EXPRESSION -> PARENTHETICAL EXPRESSION  
EXPRESSION -> EXPRESSION PARENTHETICAL EXPRESSION  
EXPRESSION -> REPEAT  
PARENTHETICAL -> PARENTHETICAL'*'  
PARENTHETICAL -> '('EXPRESSION')'  
REPEAT -> REPEAT REPEAT
REPEAT -> TERMINAL'*'  
REPEAT -> TERMINAL  
TERMINAL -> TERMINAL'*'  
TERMINAL -> 'a'  
TERMINAL -> 'b'  
TERMINAL -> 'c'  
TERMINAL -> 'd'

This can also be written as:

S -> E  
E -> E'|'E  
E -> P'*'  
E -> P  
E -> E P  
E -> P E 
E -> E P E  
E -> R  
P -> P'*'  
P -> '('E')'  
R -> R R
R -> T'*'  
R -> T  
T -> T'*'  
T -> 'a'  
T -> 'b'  
T -> 'c'  
T -> 'd'

I'm fairly certain this is right, and I double checked it with a bunch of varied test inputs. Any confirmation would be appreciated.

Yea its homework. Not looking for an answer, just not sure how to start. So far I have something like: S -> S'('S')'S | A A -> But I'm not sure if I'm even on the right track. — Tony Dakhoul
@TonyDakhoul include that last comment in your question so people read it! It's always a good idea to include directly in your question what you tried ! — Hugo Dozois

FrankieTheKneeMan FrankieTheKneeMan · Accepted Answer · 2013-03-22T17:48:53

This is just a question of iteration. I always like to start with the rule Start-> NonTerminal, just to give myself half a chance at a regular grammar. Nested parentheticals I believe will reduce us to a Context Free Grammar, but that's fine. I find it makes that step easier. Then, you have to start with the LOWEST precendence operator. Parenthetical, and Union in your case. So, initially:

Start -> Expression
Expression -> Expression|Expression
Expression -> (Expression)

Note that I have eschewed normal '|' notation because it's one of the symbols in your alphabet. This avoids confusion. Then, we can add the next operator, concatenation, with a set of rules that generate repetion.

Start -> Expression
Expression -> Expression|Expression
Expression -> (Expression)

Expression -> Repeat
Repeat -> RepeatRepeat
Repeat -> Terminal
Terminal -> a
Terminal -> b
Terminal -> c
Terminal -> d

Then, all that's left is to add the Kleene Star as the lowest precedence operator:

Start -> Expression
Expression -> Expression|Expression
Expression -> (Expression)

Expression -> Repeat
Repeat -> RepeatRepeat
Repeat -> Terminal
Terminal -> a
Terminal -> b
Terminal -> c
Terminal -> d

Repeat -> Terminal*
Expression -> (Expression)*

And we have a grammar that generates all non-empty regular expressions from the desired alphabet. In more traditional variable notation, it might look like this:

S -> E
E -> E|E
E -> (E)
E -> (E)*
E -> R
R -> RR
R -> T*
R -> T
T -> a
T -> b
T -> c
T -> d

There's a grammar I'm pretty sure does it. Hope someone will check my work, because it's been a few years since I wrote a formal grammar.

EDIT:

As a result of said double checking, it was pointed out that concatenating Expressions and Parenthetical Expressions wasn't working. However, we can add that. The first step is to add a new rule Expression -> Parenthetical and change Expression -> (Expression) and Expression -> (Expression)* to Parenthetical -> (Expression) and Parenthetical -> (Expression)* Then, A couple quick rules for concatenating them leads to the ruleset:

Start -> Expression
Expression -> Expression|Expression
Expression -> Parenthetical

Parenthetical -> (Expression)
Parenthetical -> (Expression)*

Expression -> Expression Parenthetical
Expression -> Parenthetical Expression
Expression -> Expression Parenthetical Expression

Expression -> Repeat
Repeat -> Repeat Repeat
Repeat -> Terminal
Terminal -> a
Terminal -> b
Terminal -> c
Terminal -> d

Repeat -> Terminal*

Grammar for language of regular expressions with given alphabet

2 Answers