Context Free Grammar Parse Tree

Question

I have strings of equal numbers x and y. My CFG should accept them in the forms xy, xyxy, xyxyxy, xxxyyy, and xxyxyy.

I have come up with these production rules:

S --> SAB |e

A --> xSy |e

B --> ySx |e

Im working on creating the parse tree, but I am not completely understanding. This is what I have done

                                      S
                                   /  |  \
                                  S   A   B
                                 /  / | \   \
                               e   x  S  y    e
                                    / | \
                                  S   A   B
                                 /  / | \   \
                                e  x  S  y   e
                                      |
                                      e

If I am understanding correctly the above parse tree represents xyxy.... and so one if I continue

How does this represent xxyy?

How can it represent xxyxyy?

This is what I am not understanding...

Welcome to StackExchange! You may want to ask these kind of questions in the Computer Science StackExchange cs.stackexchange.com — wizclown
How did you come up with those rules? Can you explain how they work, independent of the parse tree? — rici

nobody nobody · Accepted Answer · 2014-09-28T00:12:36

First, I'm not sure what's the correct reading order for the tree – I assume it's in-order, cause it usually is with almost everything (compare the Wikipedia article on tree traversals) but I'm not absolutely certain that's right. (If this is wrong, you'll want different trees but the ideas should translate.)

Under that reading, your example tree represents xxyy. By substituting further S-A-S-A-… chains for the bottommost ε, you'd get an arbitrary chain of xx…yy.

To represent xy…xy, you'd alternate expanding As and Bs to xSy or ySx, that is: S → SAB, then B → ε and A → xSy; on the next layer expand S → SAB again but instead now do A → ε and B -> ySx; continue as needed. An example:

            S
           /|\
          / | \
         S  A  B
        /  /|\  \
       ε  / | \  ε
         x  S  y
           /|\
          / | \
         S  A  B
        /  /  /|\
       ε  ε  / | \
            y  S  x
               |
               ε

This should match xyxy, and can again be expanded at the bottom for longer sequences.

To save on space, I'll call the longest path from the root in the parse tree its "spine". (For the simple examples here, all off-"spine" paths are conveniently terminal symbols, so this description of the trees is unique (enough). In general, the 'side paths' are more complex so this definition is usually useless. Here, it works nicely.)

For the tree drawn here, its "spine" would be SASBSε (just read down the middle). Extended for arbitrary nesting, that would become (SASB)^*(SA)^?Sε.

The tree you've drawn above has the "spine" SASASε, for arbitrarily long sequences of xxx…yyy, you'd get (SA)^*Sε.

xxyxyy is easily described in the same way: its "spine" is SASASBSε.

In general (for this grammar), for any sequence that's "inverse-mirrored" (the second half is the first half reversed and with x/y exchanged – i.e. x|y, xyx|yxy, xxy|xyy, …), you can construct this "spine" by taking just the first half and expanding x → SA, y → SB and adding Sε when you've reached the middle. (Of course, for more complex patterns like xxyyyx this fails. Finding a way to quickly come up with a tree for these asymmetric patterns is left as an exercise for the reader ;-).)

Context Free Grammar Parse Tree

1 Answers