Verification of Neural Networks

A problem in two parts

Netron, an onnx visualizer

; test_small.vnnlib

(declare-const X_0 Real)
(declare-const Y_0 Real)
(assert (or
    (and
        (>= X_0 -1)
        (<= X_0 1)
        (>= Y_0 100)
    )
))

vnnlib file format, an smt2lib dialect

Taken from the VNNCOMP 2021 benchmarks

The verification

Loading diagram...

The goal is to linear the output layer of the neural network and verify the constraints

Common activation functions

Loading diagram...

$R(x) = \begin{cases} 0 & x < 0 \newline x & x \ge 0 \end{cases}$

Loading diagram...

$LR(x) = \begin{cases} ax & x < 0 \newline x & x \ge 0 \end{cases}$

Other notable activation functions

Loading diagram...

$S(x) = \frac{1}{1 + e^{-x}}$

Loading diagram...

$T(x) = \frac{e^{2x} - 1}{e^{2x} + 1}$

Linearize the activation functions

The approach used by $\alpha\text{-}\beta\text{-crown}$ , winner of VNNCOMP 2021, 2022 and 2023 is to linearize the activation functions, reducing the error introduced by the approximation with tunable parameters.

ReLU linearization

Paper

Convex relaxation of Neural Networks

To deal with intractability, convex relaxations can be used. For instance, Planet relaxation substitutes the ReLU function $y = \max(0, x)$ with

y \ge 0 \newline y \ge x \newline y \le \frac{u}{u - l}x - \frac{ul}{u - l} \newline

where $l$ and $u$ are the lower and upper bounds of the input $x$ . They can be obtained directly or estimated.

Propagate bounds

Bound propagation is a key step in guiding the search for a solution. State-of-the-art tools include LiPRA and PRIMA.

dlinear's current naive implementation only propagate explicit bounds and equality constraints.

These functionalities can be extended by considering the interval bounds of the input variables and propagating them through the graph of constraints.

Sum of infeasibilities

Comparison

Tool	Strict constraints	Output eq $y$	Correlated inputs	Unbounded $x$	Or	Precision	Efficient
dlinear	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\times$
alpha-beta-crown	$\checkmark$ *	$\times$	$\times$	$\times$	$\times$	$10^{-10}$	$\checkmark$
neurosat	$\times$	$\checkmark$	$\times$	$\times$	$\checkmark$	$\checkmark$	$\checkmark$
nnenum	$\times$	$\checkmark$	$\times$	$\times$	$\times$	$10^{-10}$	$\checkmark$
Marabou	$\times$	$\checkmark$	$\checkmark$	$\times$	$\checkmark$	$\checkmark$	$\checkmark$

SMT components: Terms and formulae

$f \in V_B$ : propositional variables (Formula)
$x \in V_R$ : real-valued variables (Term)
$a, c \in \mathbb{R}$ : constants (Term)
$\sim \in \\{=, \neq, <, \leq, >, \geq\\}$ : comparison operators

\underbrace{\underbrace{a_{11} x_1 + \dots + a_{1n} x_n + c_1}_{\text{Term}} \sim \underbrace{a_{21} x_{n + 1} + \dots + a_{2m} x_m + c_2}_{\text{Term}}}_{\text{Formula}}

Conjunctive normal form

Most SMT solvers expect the input in CNF form, where $l_{ij}$ are literals

( l_{00} \lor \dots \lor l_{0m_0}) \land (l_{10} \lor \dots \lor l_{1m_1}) \land \dots \land (l_{n0} \lor \dots \lor l_{nm_n})

If-then-else terms

If $f, f_1, f_2$ are formulas, $\text{ite}(f, f_1, f_2)$ is a formula equivalent to $(f_1 \land f_2 ) \lor (\neg f1 \land f3 )$

If $t_1, t_2$ are terms and $f$ is a formula, $\text{term-ite}(f, t_1, t_2)$ is a term

Piecewise linear functions to ITE

Piecewise linear functions can be represented using if-then-else terms

; ReLU
(declare-const x Real)
(declare-const y Real)

(assert (= y (ite (<= x 0) 0 x)))

ITE to CNF

The if-then-else term can be converted to CNF by introducing a fresh variable $c$ with the following equisatisfability relation

$f(\text{term-ite}(g, t1 , t2)) \equiv f (c) \land \text{ite}(g, t1 = c, t2 = c)$

e.g.

\text{term-ite}(g, 1, 2) = \text{term-ite}(h, 3, 4) \newline \text{becomes} \newline \text{ite}(g, c = 1, c = 2) \land \text{ite}(h, c = 3, c = 4)

Max encoding

The max function can be seen as a special case of an ITE term. Exploiting its characteristics, introducing two fresh variables $a_1, a_2$ allows to encode it directly in CNF:

\begin{array}{lcr} y = \max(x_1, x_2) & \implies & (y − x_1 = a_1) \land (a_1 \ge 0) \land \newline & & (y − x_2 = a_2) \land (a_2 \ge 0) \land \newline & & (a1 \le 0 \lor a2 \le 0) \end{array}

Linear layers, non-linear activation layers

Given a neural network with $L$ layers, we can divide them into two categories:

Linear layers: $f_i(x) = W_i x + b_i$ $f_{i} (x) = W_{i} x + b_{i}$
- Input: $x \in \R^m$ , weights: $W_i \in \R^{n \times m}$ , bias: $b_i \in \R^n$
Activation layers: non-linear $f$ $f$
- Piece-wise linear functions
- General non-linear functions

Neural Network

Tightening the bounds

Converting all the layers of a neural network up to an activation layer to a linear constraint allows to compute the bounds of the output of the activation layer, as long the input is bounded.

\begin{array}{lcr} -1 \le x_1 \le 1 \newline -4 \le x_2 \le 7 \newline r_1 = 2x_1 + 3x_2 - 1 & \implies & -15 \le r_1 \le 22 \newline r_2 = 4x_1 - 2x_2 + 3 & \implies & -15 \le r_2 \le 15 \newline \end{array}

Fixing the piecewise linear functions

If the bounds on output of the activation layer are strict enough, it may be possible to fix the piecewise linear term.

\begin{array}{lc} 0 \le x_1 \le 1 \newline 4 \le x_2 \le 7 \newline r_1 = & \begin{cases} 2x_1 + 3x_2 - 1 & \text{if } 2x_1 + 3x_2 - 1 > 0 \newline 0 & \text{otherwise} \end{cases} \newline & \implies r_1 = 2x_1 + 3x_2 - 1 \end{array}

Sum of Infeasibilities

Instead of adding the non-fixed activation layers to the constraints of the LP problem, they can be used to minimize the sum of infeasibilities.

Sorting by the violation they introduce gives us a way to prioritize the search for the solution.

\begin{array}{lcr} \min & r_1 - (2x_1 + 3x_2 - 1) + r_2 \newline \text{s.t.} & -1 \le x_1 \le 1 \newline & -4 \le x_2 \le 7 \newline & r_1, r_2 \ge 0 \newline & r_1 \ge 2x_1 + 3x_2 - 1 \newline & r_2 \ge 4x_1 - 2x_2 + 3 \end{array}

Completeness vs Real world

SMT solvers aim for a complete approach, a mathematical solution of the problem, employing symbolic representation of the inputs and exact arithmetic (when possible).
In the real world, however, speed of the computation is usually the main concert, hence floating point arithmetic is almost always used.

As a result, it can happen that the solution found by the SMT solver is not the same as the one computed by the neural network (e.g. OnnxRuntime).

Future work

Benchmarks
Other heuristics to optimize the search for the solution
use overapproximation of bounds to reduce the search space
- How much completeness are we sacrificing?

References

Symbolic representation with focus on ITE and max terms
Efficient Term-ITE Conversion for Satisfiability Modulo Theories
Different approaches to optimize the solution search in the exponential subproblem tree
Floating-Point Verification using Theorem Proving