2
votes

General problem

I've noticed several times that push-pop scopes that have already been popped appear to affect the time that a check-sat in a subsequent scope needs.

That is, assume a program with multiple (potentially arbitrarily nested) push-pop scopes, each of which contain a check-sat command. Furthermore, assume that the second check-sat takes 10s, whereas the first one takes only 0.1s.

...
(push)
  (assert (not P))
  (check-sat) ; Could be sat, unsat or unknown
(pop)
...
(push)
  (assert (not Q))
  (check-sat) ; Could be sat, unsat or unknown
(pop)

After commenting the first push-pop scope, the second check-sat only takes 1s. Why is that?

  • AFAIK, Z3 switches to incremental solvers if push-pop scopes are used. Is there a (conceptual) reason for why they might behave this way?

  • I was told once that Z3 attributes symbols by importance, which affects proof search (and the importance of a symbol can also change during proof search). Could that be the reason? Is it possible to reset importance (in between scopes)?

  • Could it be a bug? I found this post where Leonardo mentions a bug that seems related (his answer is from 2012, though).

Particular instance

I unfortunately only have fairly long (automatically generated) SMTLib files that illustrate the behaviour, one of which you can find as this gist. It uses quantifiers and uninterpreted functions, but neither mbqi nor arrays or bit vectors. The example consists of 148 nested push-pop scopes and 89 check-sats, and it takes Z3 4.3.2 about 8s to process it. The last check-sat (prefixed by an echo) takes by far the longest.

I randomly commented several push-pop scopes (one at a time, never the last one, make sure you don't comment symbol declarations), and in most cases, the overall run time went down to less than 1s. That is, the last check-sat was done significantly faster.

To provide further details, I compared a run of Z3 with all scopes (slow, 8s) with a run of Z3 where the scope marked by [XXX] had been commented (fast, 0.3s). The results can be seen in this diff (left is slow, right is fast).

The diff shows that all check-sats behave the same (I faked the commented one by echo'ing "unsat"), from which I conclude that the commented scope affects the proof search, but not its final outcome.

I also tried to make some sense out of differences in the obtained statistics, but I know very little about how to interpret the statistics correctly. Here are the few statistics I could make sense of:

  • grobner (383 vs 36) and nonlinear-horner (342 vs 25), so it seems that the slower run performs more arithmetic-related operations. The commented scope is indeed about to non-linear arithmetic (and so are lots of others), but the particular proof in the commented scope should be "trivial", it is essentially showing that x != 0 for an x about which 0 < x has been assumed explicitly.

  • memory (40 vs 7), which I interpret as indicating that Z3 explores a larger search space in the slow version of the program

  • quant-instantiations (43k vs 51k), which surprised me because the significantly faster run nevertheless triggered notably more quantifier instantiations.

1

1 Answers

4
votes

I'm not exactly sure whether this is an observation or a question? Yes, Z3 will behave differently for different inputs and push/pop are not "innocent", i.e., they will have a major impact on the performance. This is most obvious if they can be removed completely, because that allows Z3 to switch to completely different sub-solvers that don't support incrementality (but are often faster). For example, for purely bit-blasted formulas without scoping, Z3 will use a fast, new SAT solver, but if push/pop are required, it falls back to a much older and slower SAT solver (the implementations of those two solvers are completely disjoint).

Further, removing some of the scopes between others may have a huge impact too, because it allows Z3 to keep more intermediate lemmas, as well as heuristic states. If it is for some reason desired that each query behaves as if no others were there, then it's perhaps best to simply generate independent queries and to start Z3 from scratch on each of them.

More information about the specific issues mentioned:

"Heuristic states" means all kinds of heuristic data Z3 uses, there is a whole plethora of different heuristics at work, not just one particular one like the symbol ordering. Whether it is "good" to keep this information between queries depends entirely on your problems - the heuristics work on some problems, but not on all, as is the nature of heuristics. The whole concept of incrementality is built upon this though: if the heuristics wouldn't help, then we'd be better off running independent queries. However, in some applications, resetting Z3 sometimes is better than no resets or independent queries, for instance, when you have a huge number of tiny queries.

Conceptual reason for switching to a different solver: the first one doesn't support the features you need. See combined_solver.cpp, function check_sat. If solver1 is not used (e.g., if assumptions are provided or if incremental mode is enabled) then solver2 will be used.

combined_solver.solver2_timeout will put a timeout solver2. What happens when solver2 times out is set by the option combined_solver.solver2_unknown. So, yes, you can run solver1 after solver2, but solver1 is also allowed to fail, i.e., return unknown. Looking at the code, it may well be unsound (e.g., ignoring assumptions) if that's used.

Re: the bug report that was mentioned: That was a soundness bug, not a performance bug; one of the solvers said SAT and the other said UNSAT.