Kernel-based learning of safety certificates

A Data-Driven Framework for Certifying Safety in Systems with Black-Box Stochastic Dynamics

By Oliver Schön, Zhengang Zhong, Sadegh Soudjani

Problem statement

How to learn a safety certificate for a system with unknown stochastic dynamics?

Loading diagram...

Loading diagram...

Loading diagram...

Unknown: we don't know the model

Unknown + Noisy: we don't know the model & the output is stochastic

Problem statement

More formally, we are interfacing with

state space $\mathcal{X} \sube \mathbb{R}^{d_x}$ ,
control input space $\mathcal{U} \sube \mathbb{R}^{d_u}$ ,
a probability measure $P_w$ over $\mathcal{W} \sube \mathbb{R}^{d_w}$ ,
a transition function $f: \mathcal{X} \times \mathcal{U} \times \mathcal{W} \to \mathcal{X}$

such that the, given the current state $x_t \in \mathcal{X}$ and control input $u_t \in \mathcal{U}$ , the next state $x_{t+1} \in \mathcal{X}$ is given by

x_{t+1} = f(x_t, u_t, w_t), \quad w_t \sim P_w

Barrier certificates (BCs)

System safety can be proven via a barrier certificate.

Barrier certificates (BCs)

More formally, given

initial set $\mathcal{X}_0 \subseteq \mathcal{X}$ ,
unsafe set $\mathcal{X}_u \subseteq \mathcal{X}$ ,

we want to learn a function $B: \mathcal{X} \to \mathbb{R}$ and constants $\gamma \ge \eta \ge 0$ , $c \ge 0$ such that

\begin{aligned} & \text{(i) } & & \forall x \in \mathcal{X} & & B(x) \ge 0 \newline & \text{(ii) } & & \forall x_0 \in \mathcal{X}_0 & & B(x_0) \le \eta \newline & \text{(iii) } & & \forall x_u \in \mathcal{X}_u & & B(x_u) \ge \gamma \newline & \text{(iv) } & & \forall x \in \mathcal{X} & & \mathbb{E}[B(x_{t+1})| X = x] - B(x) \le c \end{aligned}

Barrier certificates (BCs)

If the barrier exists, for the horizon $T \in \mathbb{N}$ we have

\mathbb{P}(\text{System safe}) \ge 1 - \frac{\eta - cT}{\gamma}

Data-driver Barrier Certificates

The main obstacle to learning the barrier is the unknown, stochastic, non-linear dynamics.
We must gather information by collecting data from the system via sampling $N \in \mathbb{N}$ times

\begin{array}{c} &\{x_i , u_i , x_{i}^{+} \}_{i = 1}^{N} \newline &x_i^{+} = f(x_i, u_i, w_i), \quad w_i \sim P_w \end{array}

Loading diagram...

Reproducing kernels

A symmetric positive definite function $k: \mathcal{X} \times \mathcal{X} \to \mathbb{R}$ defines a reproducing kernel Hilbert space (RKHS) $\mathcal{H}_{k_\mathcal{X}}$ of functions $f: \mathcal{X} \to \mathbb{R}$ .

Kernels have a useful reproducing property:

f(x) = \langle f, k(x, \cdot) \rangle _{\mathcal{H}_{k_\mathcal{X}}} \quad \forall x \in \mathcal{X}, f \in \mathcal{H}_{k_\mathcal{X}}

Tip

If the dynamics $\in \mathcal{H}_{k_\mathcal{X}}$ , we can reproduce them via $k$ .

Family of kernels

Multiple kernels exist, each with different properties.

Linear kernel: $k(x, y) = x^T y$ .
Polynomial kernel: $k(x, y) = (\gamma x^T y + c)^d$ , with $d \in \mathbb{N}$ .
Sigmoid kernel: $k(x, y) = \tanh(\gamma x^T y + c)$ .
Gaussian kernel: $k(x, y) = \sigma_f \exp(-\|x - y\|^2 / (2\sigma_l^2))$ .

Kernel conditional mean embeddings

Given two RKHSs $\mathcal{H}_{k_\mathcal{X}}$ and $\mathcal{H}_{k_\mathcal{Y}}$ , the CME of a conditional probability measure $\mathbb{P} : \mathcal{X} × B(\mathcal{Y}) \to [0, 1]$ is an $X$ -measurable random variable taking values in $\mathcal{H}_{k_\mathcal{Y}}$ given by

\mu_{k_\mathcal{Y}|k_\mathcal{X}}(\mathbb{P})(\cdot) \coloneqq \mathbb{E}_{\mathbb{P}}[\phi_{k_\mathcal{Y}}(Y) | X = \cdot]

Data driver barrier formulation

Using the CME, we can rewrite the last barrier condition as

\begin{aligned} & \text{(iv) } & & \forall x \in \mathcal{X} & & \langle B, \mu_{X^+|X}(f)(x) \rangle _{\mathcal{H}_k} - B(x) \le c - \underbrace{\varepsilon C \| B \|_{\mathcal{H}_k}}_{\text{constant}} \end{aligned}

The robustness is given by the last term. It depends on the maximum frequency we need to capture the transition function.

From infinite NL to finite LP

The Gaussian kernel can be approximated via a truncated Fourier series expansion

f \in \mathcal{H}_{k_\mathcal{X}} \implies f(x) \approx b^T\phi(x) = b^T \begin{bmatrix} q_0 \newline q_1 \cos(\omega_1 x) \newline q_1 \sin(\omega_1 x) \newline \dots \newline \newline q_n \cos(\omega_n x) \newline q_n \sin(\omega_n x) \end{bmatrix}

with $n \in \mathbb{N}, b, q, \omega \in \mathbb{R}^{n}$ .

Lucid

We want to create a tool called Lucid that provides probabilistic guarantees with respect to a system's properties.

It will leverage the techniques we just presented.

It should be easy to use and install without sacrificing performance.

Architecture

Loading diagram...

Example

Define the system and sample some points.

N = 1000
# Transition function f
f = lambda x: ...

# State space
X_bounds = RectSet((-3, -2), (2.5, 1))
# Initial set
X_i = MultiSet(
    RectSet((1, -0.5), (2, 0.5)),
    RectSet((-1.8, -0.1), (-1.2, 0.1)),
    RectSet((-1.4, -0.5), (-1.2, 0.1)),
)
# Unsafe set
X_u = MultiSet(
    RectSet((0.4, 0.1), (0.6, 0.5)),
    RectSet((0.4, 0.1), (0.8, 0.3))
)

x_samples = X_bounds.sample(N)
# We get the next state by applying f
xp_samples = f(x_samples)

Loading diagram...

Example

Sample some points to train the model to predict the truncated feature map.

# Create a truncated Fourier feature map
tffm = TruncatedFourierFeatureMap(
    num_frequencies=6,  # Include the 0 frequency
    sigma_l=[0.1, 0.2],
    sigma_f=1,
    x_limits=X_bounds,
)

# Apply the feature map to the samples
xp_features = tffm(xp_samples)

# Train the model so that it can predict the feature map
model = KernelRidgeRegressor(
    kernel=GaussianKernel(
        sigma_l=[0.1, 0.2],
        sigma_f=1
    ),
    regularisation=1e-6
)
model.fit(x_samples, xp_features)

Loading diagram...

Example

We can now use the model to learn the barrier on a lattice of points.

# Create a lattice of points
# It must be at least 2 times
# the number of frequencies (Nyquist theorem)
x_lattice = X_bounds.lattice(6 * 2, True)
xi_lattice = X_init.lattice(6 * 2, True)
xu_lattice = X_unsafe.lattice(6 * 2, True)

# Obtain the feature maps on both lattices
x_f_lattice = tffm(x_lattice)
xp_f_lattice = model.predict(x_lattice)
xi_f_lattice = tffm(xi_lattice)
xu_f_lattice = tffm(xu_lattice)

# Efficiently gather more points using the FFT
x_f_l_upsampled = fft_upsample(x_f_lattice, 2)
xp_f_l_upsampled = fft_upsample(xp_f_lattice, 2)
xi_f_l_upsampled = fft_upsample(xi_f_lattice, 2)
xu_f_l_upsampled = fft_upsample(xu_f_lattice, 2)

Loading diagram...

Example

We can finally learn the barrier by solving a linear program.

# Define the callback function
cb = lambda success, obj_val, sol, eta, c, norm: ...

# Initialise and run the LP
xp_f_l_upsampled = fft_upsample(xp_f_lattice, 2)
x_f_l_upsampled = fft_upsample(x_f_lattice, 2)
xi_f_l_upsampled = fft_upsample(xi_f_lattice, 2)
xu_f_l_upsampled = fft_upsample(xu_f_lattice, 2)
GurobiLinearOptimiser(T=10,
    gamma=1,
    epsilon=0,
    sigma_f=1.0
).solve(
    xi_f_l_upsampled,
    xu_f_l_upsampled,
    x_f_l_upsampled,
    xp_f_l_upsampled,
    ...
    cb,
)

Loading diagram...

Future work

A lot to do!

Ensure correctness of the implementation.
Introduce automatic hyperparameter tuning.
Handle different specifications, not just safety.
Support more kernels, feature maps and estimators.
Extension to control synthesis.

References

Data-Driven Safety Verification of Stochastic Systems via Barrier Certificates
Kernel Mean Embedding of Distributions: A Review and Beyond
Kernel-Based Learning of Safety Barriers: TBA