How much is
print(1 + 2) # 3
How much is
print(1.0 + 2.0) # 3.0
How much is
print(0.1 + 0.2) # 0.30000000000000004
How do we represent numbers in a computer?
Computers only store bits, usually in groups of 8, 16, 32, or 64.
It is natural to represent integers in binary.
Loading diagram...min0≤value1011012=23≤max231−1=2,147,483,647
What about decimal numbers?
A natural way is to use fixed-point numbers.
Loading diagram...min0≤value111001.1012=23.625≤≤max1511⋯11.1611⋯112=32767.99998474121
Pros
Cons
The IEEE 754 standard defines floating-point numbers.
Loading diagram...min0≤value1.1012×22=6.5≤≤max1.2311⋯112×2127=(2−2−23)×2127
The number of bits is still finite.
#include <stdio.h>
int main() {
float meters = 0;
int iterations = 100000000;
for (int i = 0; i < iterations; i++) {
meters += 0.01;
}
printf("Expected: %f km\n", 0.01 * iterations / 1000 );
printf("Got: %f km \n", meters / 1000);
}
Output:
// Expected: 1000.000000 km
// Got: 262.144012 km
The gap is too large: [262144.0],[262144.03125],[262144.0625]
Compute the LU factorisation of
A=[ϵ1−11]=[1l2101][u110u12u22] u11=ϵ,u12=−1,l21=ϵ−1,u22=1+ϵ−1≈ϵ−1 L^U^=[ϵ1−10]The standard dictates that the result of an operation is computed exactly and then rounded.
a+ba−ba×ba÷b⟹fl(a+b)=(a+b)(1+δ)⟹fl(a−b)=(a−b)(1+δ)⟹fl(a×b)=(a×b)(1+δ)⟹fl(a÷b)=(a÷b)(1+δ)with ∣δ∣≤ϵ.
Overflows and underflows are also possible, but they are detectable.
Let the exponent and mantissa be 4 and 7 (+1) bits.
2521.11111002×27+1.51.12= ? 11111100011111111011+=The result would need 8 bits (+1) to be represented.
1.Stored111110112×27=253.0=252+1.5=253.5Errors propagate through operations.
fl(a+b+c)=((a+b)(1+δ1)+c)(1+δ2)The exact value depends on the order of operations, making floating point arithmetic non-associative.
Error analysis is the study of how floating-point computations affect our results.
Forward error analysis VS Backward error analysis.
The inner product of two column vectors is
x,y∈Rns^=fl(xTy)=fl(i=1∑nxiyi) s^=x1y1(1+δi×)i=2∏n(i+δi+)+i=2∑nxiyi(1+δi×)j=i∏n(1+δi+)with pi∈{−1,1}
Using the backward error analysis,
fl(xTy)=(x+Δx)Ty=xT(y+Δy)with
∣Δx∣≤γn∣x∣,∣Δy∣≤γn∣y∣The simplex is among the most widely used methods for solving LP problems.
maximizesubject tocTxAx=bx≥0with A∈Rm×n, b∈Rm, and c∈Rn.
We define a basis B of m columns of A. It must be non-singular.
The revised simplex is the basis for the more common implementation.
The steps of the simplex loop until
Three linear systems involving B each iteration.
Use LU decomposition. Keep L fixed and only update U.
H(i+1)=[U1(i)⋯Uli−1(i)Uli+1(i)⋯Um(i)L−1Ae]We need to re-triangularise H(i).
B(i)=LG(i−1)H(i)=LC(1)C(2)⋯C(i)C(i)−1Γm−1(i)Πm−1(i)…Γli(i)Πli(i)H(i)==LG(i)C(1)C(2)⋯C(i)U(i)==LG(i)U(i)with i>0,G(0):=I.
It can be proven that the Bartels–Golub decomposition is backward stable.
As long as the iterations is not too large, the decomposition remains accurate.
At some point it becomes more convenient to recompute the decomposition.
∃τ,ϵ∈R, x,τ>0, such that
def fl_simplex(A, b, c, B_idx, tolerance, epsilon):
m, n = A.shape
for i in range(n!/(n-m)!):
B = A[:, B_idx]
L, G, U = bartels_golub(B)
x_B = solve(L, G, U, b)
y_B = solve_t(L, G, U, c[B_idx])
r = c - A.T @ y_B
if r <= tolerance: return ("optimal" B_idx)
e = argmax(r)
d = solve(L, G, U, A[:, e])
if d <= tolerance: return ("unbounded", B_idx)
l = argmin(x_B / d, where=d > tolerance)
B_idx[l] = e
behaves exactly.
def simplex(A, b, c):
A_f = [ A | I | -I ]
c_f = [ 0 | 1 | 1 ]
B_idx_f = [ n + i if b[i] >= 0 else n + m + i for i in range(m) ]
for (ep, tau) in product(eps, taus):
(res, B_idx) = CheckFeasibility(A_f, b, c_f, B_idx_f, tau, ep)
if res == "infeasible": return ("infeasible", 0)
if res == "error": continue
(res, B_idx) = fl_simplex(A, b, c, B_idx, tau, ep)
try: L, U = LU(A[:, B_idx]) except: continue
x_B = solve(L, U, b)
y_B = solve_t(L, U, c[B_idx])
r = c - A.T @ y_B
if all(x_B >= 0) and all(r >= 0): return ("optimal", x_B)
elif all(x_B >= 0):
d = solve(L, U, A[:, argmax(r[r < 0])])
if all(d <= 0): return ("unbounded", x_B)
We need to check that the result is exact.
We can do that using rational arithmetic given the last basis used.
If we are mistaken, we increase the precision and try again.
We limit the use of rational arithmetic to the minimum necessary.
Double precision or slightly higher is often enough.
Significant speedup compared to rational arithmetic.
We can stop when the result is within an interval around the optimal solution thanks to weak duality.
This leads to a δ-complete solver.
Applicability in QF_LRA SMT solvers.