Writing programs is famously hard. Writing program that generate programs (compilers) is harder still. Compiler verification usually comes in two flavors: (1) Proving a compiler is correct by construction using a proof-assistant, or (2) proving that each compiler pass preserves the observable semantics of a program by checking the equivalence of the input and the output programs.
Non-trivial correct by construction compilers have been demonstrated to be viable for but require several person-years of work to implement, specify1, and prove correct. On the other hand, proving program equivalence automatically is a remarkably hard problem which forces such verification efforts to somehow bound the space of program behaviors.
For our project, we implemented a pass verification infrastructure for Bril using the Rosette framework and verified the correctness of a local value numbering pass.
SMT Solving, Briefy
Underyling a lot of automatic proof generation is SMT solving. Satisfiability Modulo Theories (SMT) is a generalization of the SAT problem that allows us to augment our logic with various “theories” (naturals, rationals, arrays, etc.) to prove properties in a domain that we care about. For example, SAT + theory of integers can be used to solve Integer Linear Programming problems.
Program properties can be verified by first encoding the semantics of your language as an SMT formula and asking a solver to prove it’s correctness by finding a satisfying assignment.
Rosette is a symbolic execution engine for the Racket programming language. It lets us write normal Racket programs and does the work of automatically lifting them to perform symbolic computations.
Consider the following program:
In addition to running this program with concrete inputs (like
1), Rosette allows us to run it with a symbolic input. When computing with symbolic inputs, Rosette lifts operations like
+ to return symbolic formulas instead. So, running this program with the symbolic input
x would give us the ouput
x + 1.
Rosette also lets us ask verification queries using a symbolic inputs. We can write the following program:
Rosette will convert this into an SMT formula and verify it’s correctness using a backend solver.
If we give Rosette a falsifiable formula:
Rosette generate a model where the formula is false. In this case, Rosette will report that when
x = 0, this formula is false.
A symbolic interpreter is simply an interpreter that executes over symbolic values rather than real values. A standard interpreter takes a program, such as , and a variable assignment, and does something like: . A symbolic interpreter works on the same types of programs, but takes symbols as arguments instead of value assignments. For the same program, , symbolic interpretation would produce the formula .
This proves useful for verification because it reduces the problem of program equivalence to formula equivalence. To prove that the program is equivalent to the program we only need to reduce these to formulas and then prove their equivalence. This still looks hard, but it turns out that we can use SMT solvers to do most of the hard work.
We have reduced the problem of program equivalence to symbolic interpretation plus a query to an SMT solver. Fortunately, Rosette makes both of these tasks simple. We can write a normal interpreter for Bril in Racket and Rosette will lift the computation into SMT formulas and also make the query to the SMT solver.
Limiting scope to basic blocks
SMT theories are undecidable in general and even when you restrict it to decidable fragments, verification can take a very long time. Because symbolic interpretation involves following every path in a program and the number of paths in a program increases exponentially with the size of the program, it can be difficult to make verification with symbolic interpretation scale to large programs.
We address this problem by proving basic block equivalence rather than program equivalence. By definition, there is only a single path through a basic block. This avoids the exponential growth of the number of paths to explore and means that we only ever produce relatively simple formulas that are usually fast to verify. However, this comes at the cost of exact program equivalence, we can only give a conservative approximation.
To verify that two basic blocks are equivalent, we assume that the common set of live variables equal, and ask Rosette to verify that the symbolic formulas we get from interpretation for each assigned variable are equivalent.
A simple CSE and dead code elimination produces the following code:
We first find the common set of live variables. In this case,
a, b are live at the beginning of both of these blocks. Next, we create a symbolic version of these variables for each block. We’ll use
$ to designate symbolic variables. This gives us
a$1, b$1 for the first block and
a$2, b$2 for the second block. We assume that
a$1 = a$2 and
b$1 = b$2. Then we can call our basic block symbolic interpreter with these variables to get the following formula:
Finally we check if the variables which are defined in both blocks are equivalent. In other words, assuming that the common live variables are equal, is the following true:
The SMT solver will verify this for us, and if it can’t the formula to be valid, it will provide a counter-example to prove it. In this case, it is not too hard to see that this formula is in fact valid, which shows that these two basic blocks are functionally equivalent.
The downside of this approach is that it only conservatively approximates the result of each basic block. We may lose information about constraints on variables that cross basic block boundaries. For example, consider the following toy program:
c is a copy of
a, this program would be functionally the same if you replaced the assignment to
sum: int = add a a. However, because we are only doing verification on the basic block level, we don’t know that these programs are equivalent.
Another problem is that this approach to verification relies on the existence of test programs. We are not actually analyzing the code of the optimization so if you don’t have extensive enough tests, bugs may go by unnoticed.
To evaluate Shrimp, we implemented Common sub-expression elimination (CSE) using Local value numbering (LVN) to show that Shrimp is useful in finding correctness bugs. We intentionally planted two bugs and found a third bug in the process of testing.
There are some subtleties to a correct implementation of LVN. If you know that the variable
sum1 holds the value
a + b, you have to make sure that
sum1 is not assigned to again before you use it. For example, consider the following Bril program:
We would like to replace
sum2: int = add a b with
sum2: int = id sum1 because we have already computed the value. However, if we can’t do this directly because then
sum2 would have the value
a + b. The solution is to rename the first instance of
sum1 to something unique so that we don’t lose our reference to the value
a + b. We can then replace
sum2 with a copy from this new variable.
We implemented the faulty version and ran Shrimp. It was able to show that the programs are not equivalent and even produced a counter example to prove this. With this information, it is easy to walk through the execution of the code and discover the source of the bug.
Next we tried extending CSE to deal with associativity. It would be nice if the compiler knew that
a + b is equal to
b + a so that it could eliminate more sub expressions. The most naïve thing to do is sort the arguments for all expressions when you compare them so that
a + b is the same value as
b + a. However, this by itself is not enough. Testing the following example with Shrimp reveals the problem:
Shrimp gives us the counter example
a = -8, b = -4. The problem is that we can’t sort the arguments for every instruction;
a - b != b - a. Shrimp helps to reveal this problem.
The final bug was actually an unintentional bug that Shrimp helped us find. We made the arguably bad decision to give each Bril instruction it’s own structure that is a sub-type of a
dest-instr structure rather than to give
dest-instr an op-code field. When we were looking up values in the LVN table, we were only comparing that fields in
dest-instr where the same. We forgot to compare the actual types of the instructions! Shrimp was able to reveal this code from the following example:
This made it easy to find and fix a rather embarrassing bug in the LVN implementation.
Symbolic verification provides a trade-off between verification effort and the completeness of a verification procedure. Beyond our implementation, there has also been recent work in verifying correctness of file systems, memory models, and operating systems code using symbolic verification demonstrating the flexibility of this approach to program verification.
The problem of specifying the correctness condition of a compiler is itself a non-trivial, open research problem. Should the compiler preserve the stdout behavior, or should it give even stronger guarantees such as preserving the timing behavior?↩