Verification of Neural Networks
A problem in two parts
; test_small.vnnlib
(declare-const X_0 Real)
(declare-const Y_0 Real)
(assert (or
(and
(>= X_0 -1)
(<= X_0 1)
(>= Y_0 100)
)
))
vnnlib file format, an smt2lib dialect
Taken from the VNNCOMP 2021 benchmarks
The verification
Loading diagram...
The goal is to linear the output layer of the neural network and verify the constraints
Common activation functions
Loading diagram...
R ( x ) = { 0 x < 0 x x ≥ 0 R(x) = \begin{cases} 0 & x < 0 \newline x & x \ge 0 \end{cases} R ( x ) = { 0 x x < 0 x ≥ 0
Loading diagram...
L R ( x ) = { a x x < 0 x x ≥ 0 LR(x) = \begin{cases} ax & x < 0 \newline x & x \ge 0 \end{cases} L R ( x ) = { a x x x < 0 x ≥ 0
Other notable activation functions
Loading diagram...
S ( x ) = 1 1 + e − x S(x) = \frac{1}{1 + e^{-x}} S ( x ) = 1 + e − x 1
Loading diagram...
T ( x ) = e 2 x − 1 e 2 x + 1 T(x) = \frac{e^{2x} - 1}{e^{2x} + 1} T ( x ) = e 2 x + 1 e 2 x − 1
Linearize the activation functions
The approach used by α - β -crown \alpha\text{-}\beta\text{-crown} α - β -crown , winner of VNNCOMP 2021, 2022 and 2023 is to linearize the activation functions, reducing the error introduced by the approximation with tunable parameters.
Paper
Convex relaxation of Neural Networks
To deal with intractability, convex relaxations can be used.
For instance, Planet relaxation substitutes the ReLU function y = max ( 0 , x ) y = \max(0, x) y = max ( 0 , x ) with
y ≥ 0 y ≥ x y ≤ u u − l x − u l u − l y \ge 0 \newline
y \ge x \newline
y \le \frac{u}{u - l}x - \frac{ul}{u - l} \newline y ≥ 0 y ≥ x y ≤ u − l u x − u − l u l
where l l l and u u u are the lower and upper bounds of the input x x x .
They can be obtained directly or estimated.
Propagate bounds
Bound propagation is a key step in guiding the search for a solution.
State-of-the-art tools include LiPRA and PRIMA .
dlinear's current naive implementation only propagate explicit bounds and equality constraints.
These functionalities can be extended by considering the interval bounds of the input variables and propagating them through the graph of constraints.
Sum of infeasibilities
Comparison
Tool Strict constraints Output eq y y y Correlated inputs Unbounded x x x Or Precision Efficient dlinear ✓ \checkmark ✓ ✓ \checkmark ✓ ✓ \checkmark ✓ ✓ \checkmark ✓ ✓ \checkmark ✓ ✓ \checkmark ✓ × \times × alpha-beta-crown ✓ \checkmark ✓ *× \times × × \times × × \times × × \times × 10 − 10 10^{-10} 1 0 − 10 ✓ \checkmark ✓ neurosat × \times × ✓ \checkmark ✓ × \times × × \times × ✓ \checkmark ✓ ✓ \checkmark ✓ ✓ \checkmark ✓ nnenum × \times × ✓ \checkmark ✓ × \times × × \times × × \times × 10 − 10 10^{-10} 1 0 − 10 ✓ \checkmark ✓ Marabou × \times × ✓ \checkmark ✓ ✓ \checkmark ✓ × \times × ✓ \checkmark ✓ ✓ \checkmark ✓ ✓ \checkmark ✓
f ∈ V B f \in V_B f ∈ V B : propositional variables (Formula)
x ∈ V R x \in V_R x ∈ V R : real-valued variables (Term)
a , c ∈ R a, c \in \mathbb{R} a , c ∈ R : constants (Term)
∼ ∈ = , ≠ , < , ≤ , > , ≥ \sim \in \\{=, \neq, <, \leq, >, \geq\\} ∼∈ = , = , < , ≤ , > , ≥ : comparison operators
a 11 x 1 + ⋯ + a 1 n x n + c 1 ⏟ Term ∼ a 21 x n + 1 + ⋯ + a 2 m x m + c 2 ⏟ Term ⏟ Formula \underbrace{\underbrace{a_{11} x_1 + \dots + a_{1n} x_n + c_1}_{\text{Term}} \sim \underbrace{a_{21} x_{n + 1} + \dots + a_{2m} x_m + c_2}_{\text{Term}}}_{\text{Formula}} Formula Term a 11 x 1 + ⋯ + a 1 n x n + c 1 ∼ Term a 21 x n + 1 + ⋯ + a 2 m x m + c 2
Most SMT solvers expect the input in CNF form, where l i j l_{ij} l ij are literals
( l 00 ∨ ⋯ ∨ l 0 m 0 ) ∧ ( l 10 ∨ ⋯ ∨ l 1 m 1 ) ∧ ⋯ ∧ ( l n 0 ∨ ⋯ ∨ l n m n ) ( l_{00} \lor \dots \lor l_{0m_0}) \land (l_{10} \lor \dots \lor l_{1m_1}) \land \dots \land (l_{n0} \lor \dots \lor l_{nm_n}) ( l 00 ∨ ⋯ ∨ l 0 m 0 ) ∧ ( l 10 ∨ ⋯ ∨ l 1 m 1 ) ∧ ⋯ ∧ ( l n 0 ∨ ⋯ ∨ l n m n )
If-then-else terms
If f , f 1 , f 2 f, f_1, f_2 f , f 1 , f 2 are formulas, ite ( f , f 1 , f 2 ) \text{ite}(f, f_1, f_2) ite ( f , f 1 , f 2 ) is a formula equivalent to ( f 1 ∧ f 2 ) ∨ ( ¬ f 1 ∧ f 3 ) (f_1 \land f_2 ) \lor (\neg f1 \land f3 ) ( f 1 ∧ f 2 ) ∨ ( ¬ f 1 ∧ f 3 )
If t 1 , t 2 t_1, t_2 t 1 , t 2 are terms and f f f is a formula, term-ite ( f , t 1 , t 2 ) \text{term-ite}(f, t_1, t_2) term-ite ( f , t 1 , t 2 ) is a term
Piecewise linear functions to ITE
Piecewise linear functions can be represented using if-then-else terms
; ReLU
(declare-const x Real)
(declare-const y Real)
(assert (= y (ite (<= x 0) 0 x)))
ITE to CNF
The if-then-else term can be converted to CNF by introducing a fresh variable c c c with the following equisatisfability relation
f ( term-ite ( g , t 1 , t 2 ) ) ≡ f ( c ) ∧ ite ( g , t 1 = c , t 2 = c ) f(\text{term-ite}(g, t1 , t2)) \equiv f (c) \land \text{ite}(g, t1 = c, t2 = c) f ( term-ite ( g , t 1 , t 2 )) ≡ f ( c ) ∧ ite ( g , t 1 = c , t 2 = c )
e.g.
term-ite ( g , 1 , 2 ) = term-ite ( h , 3 , 4 ) becomes ite ( g , c = 1 , c = 2 ) ∧ ite ( h , c = 3 , c = 4 ) \text{term-ite}(g, 1, 2) = \text{term-ite}(h, 3, 4) \newline \text{becomes} \newline \text{ite}(g, c = 1, c = 2) \land \text{ite}(h, c = 3, c = 4) term-ite ( g , 1 , 2 ) = term-ite ( h , 3 , 4 ) becomes ite ( g , c = 1 , c = 2 ) ∧ ite ( h , c = 3 , c = 4 )
Max encoding
The max function can be seen as a special case of an ITE term.
Exploiting its characteristics, introducing two fresh variables a 1 , a 2 a_1, a_2 a 1 , a 2 allows to encode it directly in CNF:
y = max ( x 1 , x 2 ) ⟹ ( y − x 1 = a 1 ) ∧ ( a 1 ≥ 0 ) ∧ ( y − x 2 = a 2 ) ∧ ( a 2 ≥ 0 ) ∧ ( a 1 ≤ 0 ∨ a 2 ≤ 0 ) \begin{array}{lcr}
y = \max(x_1, x_2) & \implies & (y − x_1 = a_1) \land (a_1 \ge 0) \land \newline
& & (y − x_2 = a_2) \land (a_2 \ge 0) \land \newline
& & (a1 \le 0 \lor a2 \le 0)
\end{array} y = max ( x 1 , x 2 ) ⟹ ( y − x 1 = a 1 ) ∧ ( a 1 ≥ 0 ) ∧ ( y − x 2 = a 2 ) ∧ ( a 2 ≥ 0 ) ∧ ( a 1 ≤ 0 ∨ a 2 ≤ 0 )
Linear layers, non-linear activation layers
Given a neural network with L L L layers, we can divide them into two categories:
Linear layers : f i ( x ) = W i x + b i f_i(x) = W_i x + b_i f i ( x ) = W i x + b i
Input: x ∈ R m x \in \R^m x ∈ R m , weights: W i ∈ R n × m W_i \in \R^{n \times m} W i ∈ R n × m , bias: b i ∈ R n b_i \in \R^n b i ∈ R n
Activation layers : non-linear f f f
Piece-wise linear functions
General non-linear functions
Tightening the bounds
Converting all the layers of a neural network up to an activation layer to a linear constraint allows to compute the bounds of the output of the activation layer, as long the input is bounded.
− 1 ≤ x 1 ≤ 1 − 4 ≤ x 2 ≤ 7 r 1 = 2 x 1 + 3 x 2 − 1 ⟹ − 15 ≤ r 1 ≤ 22 r 2 = 4 x 1 − 2 x 2 + 3 ⟹ − 15 ≤ r 2 ≤ 15 \begin{array}{lcr}
-1 \le x_1 \le 1 \newline
-4 \le x_2 \le 7 \newline
r_1 = 2x_1 + 3x_2 - 1 & \implies & -15 \le r_1 \le 22 \newline
r_2 = 4x_1 - 2x_2 + 3 & \implies & -15 \le r_2 \le 15 \newline
\end{array} − 1 ≤ x 1 ≤ 1 − 4 ≤ x 2 ≤ 7 r 1 = 2 x 1 + 3 x 2 − 1 r 2 = 4 x 1 − 2 x 2 + 3 ⟹ ⟹ − 15 ≤ r 1 ≤ 22 − 15 ≤ r 2 ≤ 15
Fixing the piecewise linear functions
If the bounds on output of the activation layer are strict enough, it may be possible to fix the piecewise linear term.
0 ≤ x 1 ≤ 1 4 ≤ x 2 ≤ 7 r 1 = { 2 x 1 + 3 x 2 − 1 if 2 x 1 + 3 x 2 − 1 > 0 0 otherwise ⟹ r 1 = 2 x 1 + 3 x 2 − 1 \begin{array}{lc}
0 \le x_1 \le 1 \newline
4 \le x_2 \le 7 \newline
r_1 = & \begin{cases}
2x_1 + 3x_2 - 1 & \text{if } 2x_1 + 3x_2 - 1 > 0 \newline
0 & \text{otherwise}
\end{cases} \newline
& \implies r_1 = 2x_1 + 3x_2 - 1
\end{array} 0 ≤ x 1 ≤ 1 4 ≤ x 2 ≤ 7 r 1 = { 2 x 1 + 3 x 2 − 1 0 if 2 x 1 + 3 x 2 − 1 > 0 otherwise ⟹ r 1 = 2 x 1 + 3 x 2 − 1
Sum of Infeasibilities
Instead of adding the non-fixed activation layers to the constraints of the LP problem, they can be used to minimize the sum of infeasibilities.
Sorting by the violation they introduce gives us a way to prioritize the search for the solution.
min r 1 − ( 2 x 1 + 3 x 2 − 1 ) + r 2 s.t. − 1 ≤ x 1 ≤ 1 − 4 ≤ x 2 ≤ 7 r 1 , r 2 ≥ 0 r 1 ≥ 2 x 1 + 3 x 2 − 1 r 2 ≥ 4 x 1 − 2 x 2 + 3 \begin{array}{lcr}
\min & r_1 - (2x_1 + 3x_2 - 1) + r_2 \newline
\text{s.t.} & -1 \le x_1 \le 1 \newline
& -4 \le x_2 \le 7 \newline
& r_1, r_2 \ge 0 \newline
& r_1 \ge 2x_1 + 3x_2 - 1 \newline
& r_2 \ge 4x_1 - 2x_2 + 3
\end{array} min s.t. r 1 − ( 2 x 1 + 3 x 2 − 1 ) + r 2 − 1 ≤ x 1 ≤ 1 − 4 ≤ x 2 ≤ 7 r 1 , r 2 ≥ 0 r 1 ≥ 2 x 1 + 3 x 2 − 1 r 2 ≥ 4 x 1 − 2 x 2 + 3
Completeness vs Real world
SMT solvers aim for a complete approach, a mathematical solution of the problem, employing symbolic representation of the inputs and exact arithmetic (when possible).
In the real world, however, speed of the computation is usually the main concert, hence floating point arithmetic is almost always used.
As a result, it can happen that the solution found by the SMT solver is not the same as the one computed by the neural network (e.g. OnnxRuntime ).
Future work
Benchmarks
Other heuristics to optimize the search for the solution
use overapproximation of bounds to reduce the search space
How much completeness are we sacrificing?