I'm having a hard time figuring out where to start with this question. The question is to create a grammar for the language of regular expressions with the alphabet { ‘a’, ‘b’, ‘c’, ‘d’, ‘|’, ‘*’, ‘(‘, ‘)’ }
. Assume 1 line of input (a single string) and no spaces. Empty string is not valid and it should correctly capture the precedence of the operators which from lowest to highest is union ('|'), concatenation, and kleene ('*
'). What would be the best way to start writing the grammar for this? Any input would be greatly appreciated.
Not looking for an answer, just not sure how to start. So far I have something like:
S -> S'('S')'S
| A
A -> AA
| A'*'
| T
T -> T'|'T
| X
X -> 'a'
| 'b'
| 'c'
| 'd'
But I'm not sure if I'm even on the right track.
EDIT: After doing some more work this is the conclusion I've reached:
START -> EXPRESSION
EXPRESSION -> EXPRESSION'|'EXPRESSION
EXPRESSION -> PARENTHETICAL'*'
EXPRESSION -> PARENTHETICAL
EXPRESSION -> EXPRESSION PARENTHETICAL
EXPRESSION -> PARENTHETICAL EXPRESSION
EXPRESSION -> EXPRESSION PARENTHETICAL EXPRESSION
EXPRESSION -> REPEAT
PARENTHETICAL -> PARENTHETICAL'*'
PARENTHETICAL -> '('EXPRESSION')'
REPEAT -> REPEAT REPEAT
REPEAT -> TERMINAL'*'
REPEAT -> TERMINAL
TERMINAL -> TERMINAL'*'
TERMINAL -> 'a'
TERMINAL -> 'b'
TERMINAL -> 'c'
TERMINAL -> 'd'
This can also be written as:
S -> E
E -> E'|'E
E -> P'*'
E -> P
E -> E P
E -> P E
E -> E P E
E -> R
P -> P'*'
P -> '('E')'
R -> R R
R -> T'*'
R -> T
T -> T'*'
T -> 'a'
T -> 'b'
T -> 'c'
T -> 'd'
I'm fairly certain this is right, and I double checked it with a bunch of varied test inputs. Any confirmation would be appreciated.