Stuck on a simple proof about regular expressions

Question

I'm trying to formalize some properties on regular expressions (REs) using Coq. But, I've got some troubles to prove a rather simple property:

For all strings s, if s is in the language of (epsilon)* RE, then s = "", where epsilon and * denotes the empty string RE and Kleene star operation.

This seems to be an obvious application of induction / inversion tactics, but I couldn't make it work.

The minimal working code with the problematic lemma is in the following gist. Any tip on how should I proceed will be appreciated.

EDIT:

One of my tries was something like:

Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.  
  intros s H.
  inverts* H.
  inverts* H2.
  inverts* H1.
  inverts* H1.
  inverts* H2.
  simpl in *.
  -- stuck here

that leave me with the following goal:

s' : string
H4 : s' <<- (#1 ^*)
============================
s' = ""

At least to me, it appears that using induction would finish the proof, since I could use H4 in induction hypothesis to finish the proof, but when I start the proof using

induction H

instead of

inverts* H

I got some (at least for me) senseless goals. In Idris / Agda, such proof just follows by pattern matching and recursion over the structure of s <<- (#1 ^*). My point is how to do such recursion in Coq.

Did you manage to prove simpler properties in your theory? When did it fail? DId you do any research? — Dmitri Chubarov
I started my try with induction H and could proof each subgoal with subst; reflexivity except for one, the case inChr. I get stuck in the same situation when I try induction s. Is it possible that the representation is not as intended? — ichistmeinname
I'm thinking your theorem is a bit too specialized. Coq doesn't handle well induction on term that are not "all variables". Here your ` s <<- (#1 ^*)` should be replaced with ` s <<- r -> r = (#1 ^*)` so that recursion on r is a variable, not an instantiated term. You'll have lots of dummy cases but the induction hypothesis should be easier to use. You might also have a look at dependent induction. — Vinz
@Vinz The problem is that in the induction hypothesis r becomes #1 :+: ((#1 @ (#1 ^*)) so you have as a precondition the spurious #1 :+: ((#1 @ (#1 ^*)) = (#1 ^*) which means that it cannot be used anymore. — gallais
Also dependent induction will introduce a dependency on the JMeq axiom. — Anton Trunov

Anton Trunov Anton Trunov · Accepted Answer · 2016-11-25T14:30:23

Here is one possible solution of the original problem:

Lemma star_lemma : forall s,
    s <<- (#1 ^*) -> s = "".
Proof.
  refine (fix star_lemma s prf {struct prf} : s = "" := _).
  inversion_clear prf; subst.
  inversion_clear H; subst.
  - now inversion H0.
  - inversion_clear H0; subst. inversion_clear H; subst.
    rewrite (star_lemma s' H1).
    reflexivity.
Qed.

The main idea is to introduce a term in the context which will resemble the recursive call in a typical Idris proof. The approaches with remember and dependent induction don't work well (without modifications of in_regex) because they introduce impossible to satisfy equations as induction hypotheses' premises.

Note: it can take a while to check this lemma (around 40 seconds on my machine under Coq 8.5pl3). I think it's due to the fact that the inversion tactic tends to generate big proof terms.

Stuck on a simple proof about regular expressions

3 Answers