Is Alloy Analyzer "a falsifier"?

Question

In my community, recently we actively use the term "falsification" of a formal specification. The term appears in, for instance: https://www.cs.huji.ac.il/~ornak/publications/cav05.pdf

I wonder whether Alloy Analyzer does falsification. It seems true for me, but I'm not sure. Is it correct? If not, what is the difference?

Daniel Jackson Daniel Jackson · Accepted Answer · 2018-12-09T15:33:03

Yes, Alloy is a falsifier. Alloy's primary novelty when it was introduced 20 years ago was to argue that falsification was often more important than verification, since most designs are not correct, so the role of an analyzer should be to find the errors, not to show that they are not present. For a discussion of this issue, see Section 1.4, Verification vs. Refutation in Software analysis: A roadmap (Jackson and Rinard, 2000); Section 5.1.1, Instance Finding and Undecidability Compromises in Software Abstractions (Jackson 2006).

In Alloy's case though, there's another aspect, which is the argument that scope-complete analysis is actually quite effective from a verification standpoint. This claim is what we called the "small scope hypothesis" -- that most bugs can be found in small scopes (that is analyses that are bounded by a small fixed number of elements in each basic type).

BTW, Alloy was one of the earliest tools to suggest using SAT for bounded verification. See, for example, Boolean Compilation of Relational Specifications (Daniel Jackson, 1998), a tech report that was known to the authors of the first bounded model checking paper, which discusses Alloy's predecessor, Nitpick, in the following terms:

The hypothesis underlying Nitpick is a controversial one. It is that, in practice, small scopes suffice. In other words, most errors can be demonstrated by counterexamples within a small scope. This is a purely empirical hypothesis, since the relevant distribution of errors cannot be described mathematically: it is determined by the specifications people write.

Our hope is that successful use of the Nitpick tool will justify the hypothesis. There is some evidence already for its plausibility. In our experience with Nitpick to date, we have not gained further information by increasing the scope beyond 6.

A similar notion of scope is implicit in the context of model checking of hardware. Although the individual state machines are usually finite, the design is frequently parameterized by the number of machines executing in parallel. This metric is analogous to scope; as the number of machines increases, the state space increases exponentially, and it is rarely possible to analyze a system involving more than a handful of machines. Fortunately, however, it seems that only small configurations are required to find errors. The celebrated analysis of the Futurebus+ cache protocol [C+95], which perhaps marked the turning point in model checking’s industrial reputation, was performed for up to 8 processors and 3 buses. The reported flaws, however, could be demonstrated with counterexamples involving at most 3 processors and 2 buses.

Is Alloy Analyzer "a falsifier"?

2 Answers