Induction by length and then by an induction rule for context free grammar (example from prog-prove)

Question

I'm new to Isabelle/HOL (although not to HOL), so I decided to start learning by working through the examples in the excellent "prog-prove" tutorial.

I've got stuck on a context free grammar question (exercise 3.5 on page 43). In this question, you get two context free grammars:

S → ε | aSb | SS
T → ε | TaTb

and you are asked to prove they describe the same language. Proving that elements from T lie in the language described by S is easy, but I'm struggling with the other direction. In particular, I want to show that the language described by T is closed under concatenation.

Here's an informal proof: Suppose u and v lie in the language. Then we want to prove that uv lies in the language. Proceed by induction on the length of v. If length v = 0 then v is the empty string and uv = u and we are done. Otherwise, look at the production rules for T. Clearly v isn't the empty string, so it must be of the form waxb for some w, x in the language. Since w is shorter than v, (strong) induction shows that uw must lie in the language. Now use the second production rule.

This proof seems quite complicated, so I suspect that it's not the one the author of prog-prove had in mind. So I have two questions:

What (informally or formally) is the proof that the author of the exercise expects you to use?
(More technical) How would I express my informal proof above in Isabelle/HOL?

My attempt / flailing thus far starts with the following:

lemma T_mult: "cfT u ==> cfT v ==> cfT (u @ v)"
  apply (induction "length v")
  apply (auto)
  apply (induction v rule: cfT.induct)

and the second induction application fails, but I assume that this is because I'm asking Isabelle to do the wrong thing...

chris chris · Accepted Answer · 2014-11-02T09:59:48

I can't say what the authors expected, but let me comment anyway ;) First, what's your definition of T and S? I'll just assume the following:

datatype alphabet = a | b

inductive S
where
  empty [simp]: "S []" |
  betw: "S x ⟹ S (a # x @ [b])" |
  conc: "S x ⟹ S y ⟹ S (x @ y)"

inductive T
where
  empty [simp]: "T []" |
  conc_betw: "T x ⟹ T y ⟹ T (x @ a # y @ [b])"

You are right that T x ==> T y ==> T (x @ y) is the crucial lemma. As for proving this lemma, sine we have inductive definitions empirically (and heuristically) their respective induct rules are most likely to be useful for proving facts about them by induction. I have the feeling that other induction schemes, like your induction over natural numbers with length v complicate things unnecessarily.

Informally for proving

lemma
  assumes "T x" and "T y"
  shows "T (x @ y)"

I would first do a case analysis on the structure of T x (according to the cases of the grammar; the corresponding fact is T.cases) and then induction over T y (using the rule T.induct), but there might be other ways to do it. The general structure would be

using assms
proof (cases rule: T.cases)
  case empty
  ...
next
  case (conc_betw u v)
  with `T y` show ?thesis
    apply (induct rule: T.induct)
    ...
qed

Update: Actually, case analysis over the structure of x turns out to be superfluous. Informally, I would argue as follows. Assume that T x and T y holds. Now apply induction over the structure of y according to the formation rules of T.

y = []. Then we are done since T (x @ y) = T (x @ []) = T x, which is one of the assumptions.
y = u @ a # v @ [b]. By induction hypothesis (IH) we have T u, T v, T (x @ u), and T (x @ v). Now instantiating the second formation rule of T we have
```
T (x @ u) ==> T v ==> T ((x @ u) @ a # v @ [b])
```
which simplifies to T (x @ u) ==> T v ==> T (x @ u @ a # v @ [b]) (by associativity of @) and thus allows us to reduce our main goal, which happens to be T (x @ u @ a # v @ [b]) to the two subgoals T (x @ u) and T v, which in turn hold by IH.

Induction by length and then by an induction rule for context free grammar (example from prog-prove)

1 Answers