Let's imagine two easier languages: x^2m # y^m, and x^4m # y^m. What do CFGs for these look like?
T := xxTy | #
F := xxxxFy | #
The languages generated by each of these grammars is a subset of our language. In fact, we can get pretty close by just combining T and F into one start symbol, S:
S := xxxxSy | xxSy | #
This is very close, however, it does not let us generate a prefix of x of odd length. We can fix this by allowing one extra x to be added during the S := xxSy
rule and remembering that we already did this by switching to a new nonterminal R
:
S := xxxxSy | xxSy | xxxRy | #
R := xxxxRy | xxRy | #
We can now show this grammar works by induction:
Base case: the string # (for m = n = 0) is in the language. The next shortest strings in the language are xx#y, xxx#y and xxxx#y, as required.
Induction hypothesis: exactly the required strings are in the language for all m up to and including k.
Induction step: we must show that all required strings are in the language for m = k + 1. We know that all required strings are in the language for y = k; so we have x^2k # y, …, x^4k # y. Adding an extra y to the end means we need at least two extra x on the front (so our minimum is now x^2k+2 # y^(k+1)) and at most we can add four extra x (so our maximum is now x^4k+4 # y^(k+1)). The only "new" numbers of x we introduced were x^4k+1, x^4k+2, x^4k+3 and x^4k+4, and we can get these as follows:
- changing out one application of
S := xxxxSy
in the derivation of x^4k # y^k
for S := xxxRy
and one of R := xxRy
gives us x^4k+1 # y^(k+1)
.
- adding one application of
S := xxSy
in the derivation of x^4k # y^k
gives us x^4k+2 # y^(k+1)
.
- adding one application of
S := xxxRy
in the derivation of x^4k # y^k
gives us x^4k+3 # y^(k+1)
.
- adding one application of
S := xxxxSy
in the derivation of x^4k # y^k
gives us x^4k+4 # y^(k+1)
.