General problem
I've noticed several times that push-pop
scopes that have already been popped appear to affect the time that a check-sat
in a subsequent scope needs.
That is, assume a program with multiple (potentially arbitrarily nested) push-pop scopes, each of which contain a check-sat command. Furthermore, assume that the second check-sat takes 10s, whereas the first one takes only 0.1s.
...
(push)
(assert (not P))
(check-sat) ; Could be sat, unsat or unknown
(pop)
...
(push)
(assert (not Q))
(check-sat) ; Could be sat, unsat or unknown
(pop)
After commenting the first push-pop scope, the second check-sat only takes 1s. Why is that?
AFAIK, Z3 switches to incremental solvers if push-pop scopes are used. Is there a (conceptual) reason for why they might behave this way?
I was told once that Z3 attributes symbols by importance, which affects proof search (and the importance of a symbol can also change during proof search). Could that be the reason? Is it possible to reset importance (in between scopes)?
Could it be a bug? I found this post where Leonardo mentions a bug that seems related (his answer is from 2012, though).
Particular instance
I unfortunately only have fairly long (automatically generated) SMTLib files that illustrate the behaviour, one of which you can find as this gist. It uses quantifiers and uninterpreted functions, but neither mbqi
nor arrays or bit vectors. The example consists of 148 nested push-pop scopes and 89 check-sats, and it takes Z3 4.3.2 about 8s to process it. The last check-sat (prefixed by an echo
) takes by far the longest.
I randomly commented several push-pop scopes (one at a time, never the last one, make sure you don't comment symbol declarations), and in most cases, the overall run time went down to less than 1s. That is, the last check-sat was done significantly faster.
To provide further details, I compared a run of Z3 with all scopes (slow, 8s) with a run of Z3 where the scope marked by [XXX]
had been commented (fast, 0.3s). The results can be seen in this diff (left is slow, right is fast).
The diff shows that all check-sats behave the same (I faked the commented one by echo'ing "unsat"), from which I conclude that the commented scope affects the proof search, but not its final outcome.
I also tried to make some sense out of differences in the obtained statistics, but I know very little about how to interpret the statistics correctly. Here are the few statistics I could make sense of:
grobner
(383 vs 36) andnonlinear-horner
(342 vs 25), so it seems that the slower run performs more arithmetic-related operations. The commented scope is indeed about to non-linear arithmetic (and so are lots of others), but the particular proof in the commented scope should be "trivial", it is essentially showing thatx != 0
for anx
about which0 < x
has been assumed explicitly.memory
(40 vs 7), which I interpret as indicating that Z3 explores a larger search space in the slow version of the programquant-instantiations
(43k vs 51k), which surprised me because the significantly faster run nevertheless triggered notably more quantifier instantiations.