How to analyse z3 performance issues?

Question

I have 37 similar SMT2 problems, each in two equisatisfiable versions that I call compact and unrolled. The problems are using incremental SMT solving and every (check-sat) returns unsat. The compact versions are using the QF_AUFBV logic, the unrolled versions use QF_ABV. I did run them in z3, yices, and boolector (but boolector only supports the unrolled version). The results of this performance evaluation can be found here:

http://scratch.clifford.at/compact_smt2_enc_r1102.html

The SMT2 files for this examples can be downloaded from here (~10 MB):

http://scratch.clifford.at/compact_smt2_enc_r1102.zip

I run each solver 5 times with different values for the :random-seed option. (Except boolector which does not support :random-seed. So I simply run boolector 5 times on the same input.) The variation I get when running the solvers with different :random-seed is relatively small (see +/- values in the table for the max outlier).

There is a wide spread between solvers. Boolector and Yices are consistently faster than z3, in some cases up to two orders of magnitude.

However, my question is about "z3 vs z3" performance. Consider for example the following data points:

| Test Case         | Z3 Median Runtime | Max Outlier |
|-------------------|-------------------|-------------|
| insn_add unrolled |    873.35 seconds |     +/-  0% |
| insn_add compact  |   1837.59 seconds |     +/-  1% |
| insn_sub unrolled |   4395.67 seconds |     +/- 16% |
| insn_sub compact  |   2199.21 seconds |     +/-  5% |

The problems insn_add and insn_sub are almost identical. Both are generated from Verilog using Yosys and the only difference is that insn_add is using this Verilog module and insn_sub is using this one in its place. Here is the diff between those two source files:

--- insn_add.v  2017-01-31 15:20:47.395354732 +0100
+++ insn_sub.v  2017-01-31 15:20:47.395354732 +0100
@@ -1,6 +1,6 @@
 // DO NOT EDIT -- auto-generated from generate.py

-module rvfi_insn_add (
+module rvfi_insn_sub (
   input                                rvfi_valid,
   input [                32   - 1 : 0] rvfi_insn,
   input [`RISCV_FORMAL_XLEN   - 1 : 0] rvfi_pc_rdata,
@@ -29,9 +29,9 @@
   wire [4:0] insn_rd     = rvfi_insn[11: 7];
   wire [6:0] insn_opcode = rvfi_insn[ 6: 0];

-  // ADD instruction
-  wire [`RISCV_FORMAL_XLEN-1:0] result = rvfi_rs1_rdata + rvfi_rs2_rdata;
-  assign spec_valid = rvfi_valid && insn_funct7 == 7'b 0000000 && insn_funct3 == 3'b 000 && insn_opcode == 7'b 0110011;
+  // SUB instruction
+  wire [`RISCV_FORMAL_XLEN-1:0] result = rvfi_rs1_rdata - rvfi_rs2_rdata;
+  assign spec_valid = rvfi_valid && insn_funct7 == 7'b 0100000 && insn_funct3 == 3'b 000 && insn_opcode == 7'b 0110011;
   assign spec_rs1_addr = insn_rs1;
   assign spec_rs2_addr = insn_rs2;
   assign spec_rd_addr = insn_rd;

But their behavior in this benchmark is very different: Overall the performance for insn_sub is much worse than the performance for insn_add. Furthermore, in the case of insn_add the unrolled version runs about twice as fast as the compact version, but in the case of insn_sub the compact version runs about twice as fast as the unrolled version.

Here are the times before creating the median. The :random-seed setting obviously does not seem to make much of a difference:

insn_add unrolled:  868.15  873.34  873.35  873.36  874.88
insn_add compact:  1828.70 1829.32 1837.59 1843.74 1867.13
insn_sub unrolled: 3204.06 4195.10 4395.67 4539.30 4596.05
insn_sub compact:  2003.26 2187.52 2199.21 2206.04 2209.87

Since the value of :random-seed does not seem to have much of an effect, I would assume there is something intrinsic to those .smt2 files that makes them fast or slow on z3. How would I investigate this? How would I find out what makes the fast cases fast and the slow cases slow, so I can avoid whatever makes the slow cases slow? (Yes, I know that this is a very broad question. Sorry. :)

<edit>

Here are some more concrete questions along the lines of my primary question. This questions are directly inspired by the obvious differences I can see between the (compact) insn_add and insn_sub benchmarks.

Can the order of (declare-..) and (define-..) statements in my SMT input influence performance?
Can changing the names of declared or defined function influence performance?
If I split a BV into smaller BVs, and then concatenate them back again, can this influence performance?
If I either compare two BVs for equality, or split the BV into single bit variables and compare each of the bits individually, can this influence performance?
Also: What operations in z3 do actually change when I chose a different value for :random-seed?

</edit>

Making small changes to the .smt2 files without changing the semantics can be very difficult for large test cases generated by complex tools. I'm hoping there are other things I can try first, or maybe there is some existing expert knowledge about the kind of changes that might be worth investigating. Or alternatively: What kind of changes would effectively be equivalent to changing :random-seed and thus are not worth investigating.

(Tests performed with git rev c67cf16, i.e. current git head of z3 on an AWS EC2 c4.8xlarge instance with make -j40. The runtimes are CPU seconds, not wall-clock seconds.)

Edit 2:

I have now three test cases (test1.smt2, test2.smt2, and test3.smt2) that are identical except that I've renamed some of the functions I declare/define. The test cases can be found at http://svn.clifford.at/handicraft/2017/z3perf/.

This is a variation of the original problem that takes ~2 minutes to solve instead of ~1 hour. As before, changing the value of :random-seed only has a marginal effect. But renaming some of the functions without changing anything else changes the runtime by more than 2x:

I've now opened an issue on github, arguing that :random-seed should by tied into whatever thing I change randomly inside z3 when I rename the functions in my SMT2 code.

Nuno Lopes Nuno Lopes · Accepted Answer · 2017-02-21T16:03:26

As you say, there can be many things that may be creating that perf difference in add vs sub. A good start is to check if the formulas after preprocessing are equal modulo add/sub (btw, Z3 converts 'a - b' into 'a + (-1) * b'). If not, then trace down which preprocessing step is at fault. Then trace down the problem and send us a patch :)

Alternatively, the problem could be down the line, e.g., in the bitblaster. You can also dump the bit-blasted formulas of both of your files and check if there is a significant difference in terms of number of variables and/or clauses.

Anyway, you'll need to be prepared to invest a day or two (maybe more) to track down these issues. If you find something, let us know and/or send us a patch! :)

How to analyse z3 performance issues?

1 Answers