4
votes

I'm trying to formalize some properties on regular expressions (REs) using Coq. But, I've got some troubles to prove a rather simple property:

For all strings s, if s is in the language of (epsilon)* RE, then s = "", where epsilon and * denotes the empty string RE and Kleene star operation.

This seems to be an obvious application of induction / inversion tactics, but I couldn't make it work.

The minimal working code with the problematic lemma is in the following gist. Any tip on how should I proceed will be appreciated.

EDIT:

One of my tries was something like:

Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.  
  intros s H.
  inverts* H.
  inverts* H2.
  inverts* H1.
  inverts* H1.
  inverts* H2.
  simpl in *.
  -- stuck here

that leave me with the following goal:

s' : string
H4 : s' <<- (#1 ^*)
============================
s' = ""

At least to me, it appears that using induction would finish the proof, since I could use H4 in induction hypothesis to finish the proof, but when I start the proof using

induction H

instead of

inverts* H

I got some (at least for me) senseless goals. In Idris / Agda, such proof just follows by pattern matching and recursion over the structure of s <<- (#1 ^*). My point is how to do such recursion in Coq.

3
Did you manage to prove simpler properties in your theory? When did it fail? DId you do any research?Dmitri Chubarov
I started my try with induction H and could proof each subgoal with subst; reflexivity except for one, the case inChr. I get stuck in the same situation when I try induction s. Is it possible that the representation is not as intended?ichistmeinname
I'm thinking your theorem is a bit too specialized. Coq doesn't handle well induction on term that are not "all variables". Here your ` s <<- (#1 ^*)` should be replaced with ` s <<- r -> r = (#1 ^*)` so that recursion on r is a variable, not an instantiated term. You'll have lots of dummy cases but the induction hypothesis should be easier to use. You might also have a look at dependent induction.Vinz
@Vinz The problem is that in the induction hypothesis r becomes #1 :+: ((#1 @ (#1 ^*)) so you have as a precondition the spurious #1 :+: ((#1 @ (#1 ^*)) = (#1 ^*) which means that it cannot be used anymore.gallais
Also dependent induction will introduce a dependency on the JMeq axiom.Anton Trunov

3 Answers

4
votes

Here is one possible solution of the original problem:

Lemma star_lemma : forall s,
    s <<- (#1 ^*) -> s = "".
Proof.
  refine (fix star_lemma s prf {struct prf} : s = "" := _).
  inversion_clear prf; subst.
  inversion_clear H; subst.
  - now inversion H0.
  - inversion_clear H0; subst. inversion_clear H; subst.
    rewrite (star_lemma s' H1).
    reflexivity.
Qed.

The main idea is to introduce a term in the context which will resemble the recursive call in a typical Idris proof. The approaches with remember and dependent induction don't work well (without modifications of in_regex) because they introduce impossible to satisfy equations as induction hypotheses' premises.

Note: it can take a while to check this lemma (around 40 seconds on my machine under Coq 8.5pl3). I think it's due to the fact that the inversion tactic tends to generate big proof terms.

4
votes

This problem has obsessed me for a week, and I have finally found a solution that I find elegant.

I had already read that when an induction principle does not fit your needs, you can write and prove another one, more adapted to your problem. That is what I have done in this case. What we would want is the one obtained when using the more natural definition given in this answer. By doing this, we can keep the same definition (if changing it implies too many changes, for example) and reason about it more easily.

Here is the proof of the induction principle (I use a section to specify precisely the implicit arguments, since otherwise I observe strange behaviours with them, but the section mechanism is not necessary at all here).

Section induction_principle.

Context (P : string -> regex -> Prop)
  (H_InEps : P "" #1)
  (H_InChr : forall c, P (String c "") ($ c))
  (H_InCat : forall {e e' s s' s1}, s <<- e -> P s e -> s' <<- e' ->
    P s' e' -> s1 = s ++ s' -> P s1 (e @ e'))
  (H_InLeft : forall {s e e'}, s <<- e -> P s e -> P s (e :+: e'))
  (H_InRight : forall {s' e e'}, s' <<- e' -> P s' e' -> P s' (e :+: e'))
  (H_InStar_Eps : forall e, P "" (e ^*))
  (H_InStar_Cat : forall {s1 s2 e}, s1 <<- e -> s2 <<- (e ^*) ->
    P s1 e -> P s2 (e ^*) -> P (s1++s2) (e ^*)).

Arguments H_InCat {_ _ _ _ _} _ _ _ _ _.
Arguments H_InLeft {_ _ _} _ _.
Arguments H_InRight {_ _ _} _ _.
Arguments H_InStar_Cat {_ _ _} _ _ _ _.

Definition in_regex_ind2 : forall (s : string) (r : regex), s <<- r -> P s r.
Proof.
  refine (fix in_regex_ind2 {s r} prf {struct prf} : P s r :=
    match prf with
    | InEps => H_InEps
    | InChr c => H_InChr c
    | InCat prf1 prf2 eq1 =>
        H_InCat prf1 (in_regex_ind2 prf1) prf2 (in_regex_ind2 prf2) eq1
    | InLeft _ prf => H_InLeft prf (in_regex_ind2 prf)
    | InRight _ prf => H_InRight prf (in_regex_ind2 prf)
    | InStar prf => _
    end).
  inversion prf; subst.
  - inversion H1. apply H_InStar_Eps.
  - inversion H1; subst.
    apply H_InStar_Cat; try assumption; apply in_regex_ind2; assumption.
Qed.

End induction_principle.

And it turned out that the Qed of this proof was not instantaneous (probably due to inversion producing large terms as in this answer), but took less than 1s (maybe because the lemma is more abstract).

The star_lemma becomes nearly trivial to prove (as soon as we know the remember trick), as with the natural definition.

Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.
  intros s H. remember (#1 ^*) as r.
  induction H using in_regex_ind2; try discriminate.
  - reflexivity.
  - inversion Heqr; subst.
    inversion H. rewrite IHin_regex2 by reflexivity. reflexivity.
Qed.
4
votes

I modified a bit the definition of your in_regex predicate:

Inductive in_regex : string -> regex -> Prop :=
| InEps
  : "" <<- #1
| InChr
  : forall c
  , (String c EmptyString) <<- ($ c)
| InCat
  :  forall e e' s s' s1
  ,  s <<- e
  -> s' <<- e'
  -> s1 = s ++ s'
  -> s1 <<- (e @ e')
| InLeft
  :  forall s e e'
  ,  s <<- e
  -> s <<- (e :+: e')
| InRight
  :  forall s' e e'
  ,  s' <<- e'
  -> s' <<- (e :+: e')
| InStarLeft
  : forall e
  , "" <<- (e ^*)
| InStarRight
  :  forall s s' e
  ,  s <<- e
  -> s' <<- (e ^*)
  -> (s ++ s') <<- (e ^*)
where "s '<<-' e" := (in_regex s e).

and could prove your lemma:

Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.
  intros s H.
  remember (#1 ^*) as r.
  induction H; inversion Heqr; clear Heqr; trivial.
  subst e.
  rewrite IHin_regex2; trivial.
  inversion H; trivial.
Qed.

Some explanations are necessary.

  1. I did an induction on H. The reasoning is: if I have a proof of s <<- (#1 ^*) then this proof must have the following form...

  2. The tactic remember create a new hypothesis Heqr which, combined with inversion will help get rid of cases which cannot possibly give this proof (in fact all the cases minus the ones where ^* is in the conclusion).

  3. Unfortunately, this path of reasoning does not work with the definition you had for the in_regex predicate because it will create an unsatisfiable condition to the induction hypothesis. That's why I modified your inductive predicate as well.

  4. The modified inductive tries to give a more basic definition of being in (e ^*). Semantically, I think this is equivalent.

I would be interested to read a proof on the original problem.