Farkas' lemma and related stories

Farkas' lemma plays a central role in the strong duality theorem in linear programming (another post will unravel in details this implication). There are many equivalent ways to state this lemma but the following one will allow us to highlight similarities with other theorems. Formally, we consider a system of linear inequalities of the form

\[l_1(x) \geq 0, \dots, l_m(x) \geq 0\]

where \(x\) lies in \(\mathbb{R}^d\) and the \(l_i \in \mathbb{R}[x]\) are polynomials of degree \(1\). Farkas' lemma says that exactly one of the following holds:

(1) the system \(l_1(x) \geq 0, \dots, l_m(x) \geq 0\) has a solution in \(\mathbb{R}^d\),
(2) there are \(m\) non-negative integers \(q_1\geq 0,\dots, q_m \geq 0\) such that \(-1 = \sum_i q_i.l_i(x)\).

Observe that (2) is a certificate that the system has no solution since otherwise we would get \(-1 \geq 0\), a contradiction. There is a nice geometric implication of this lemma: two disjoint convex polytopes in \(\mathbb{R}^d\) can be separated by an hyperplane (left as an exercise to the reader :) ).

Farkas' lemma proof

By induction on \(d\).

For \(d = 1\) (the case \(d=0\) is true but so easy that it becomes non instructive), the inequalities are up to a non-negative rescaling of three different forms:

\(l_{a_i}(x) = a_i - x \geq 0\)
\(l_{b_j}(x) = x - b_j \geq 0\)
\(l_{c_k}(x) = c_k \geq 0\)

where \(a_i,b_j,c_k\in \mathbb{R}\). Let us forget the third kind of inequalities since there are trivially true or at least one is trivially false and this gives an \(l_i\) which is a negative constant and therefore, up to rescaling by a positive constant, is equal to \(-1\), as wanted. It is easy to see that a system of inequalities of type 1. and 2. is satisfiable if and only if

\[\max_j b_j \leq \min_i a_i \]

If this is the case, the system is satisfiable. Otherwise, take \(a_{i_0} < b_{j_0}\) and observe that \(\frac{1}{(b_{j_0} - a_{i_0})}(l_{a_{i_0}}(x) + l_{b_{j_0}}(x)) = -1\), as wanted.

For \(d \geq 2\). We write a vector \(x \in \mathbb{R}^d\) as \(x = (x_1,x_2) \in \mathbb{R} \times \mathbb{R}^{d-1}\). Again, the inequalities up to a non-negative rescaling of three different forms

\(l_{a_i}(x) = a_i(x_2) - x_1 \geq 0\)
\(l_{b_j}(x) = x_1 - b_j(x_2) \geq 0\)
\(l_{c_k}(x) = c_k(x_2) \geq 0\)

where \(a_i,b_j,c_k\) are polynomials of degree \(1\) in \(n-1\) variables. As in the case \(d=1\), the system has a solution if and only if it is possible to find \(x_2\) for which \(\max_j b_j(x_2) \leq \min_i a_i(x_2)\) and \(c_k(x_2) \geq 0\) for all \(k\). Equivalently, the system has a solution if the inequalities \(a_i(x_2)-b_j(x_2) \geq 0\) for all \(i,j\) and \(c_k(x_2) \geq 0\) for all \(k\) are satisfiable. If indeed this is satisfiable, that's great. Otherwise, by induction hypothesis, we can write \(-1\) as a non-negative linear combination of the \(c_k\) and the \(a_i-b_j\). To conclude, just observe that \(c_k = l_{c_k}\) and \(a_i-b_j = l_{a_i} + l_{b_j}\).

Going back to the 19th century, we can see that a similar situation arises in the classical (weak) Hilbert Nullstellensatz theorem. Indeed, given any algebraically closed field \(\mathbb{K}\) such as the complex numbers \(\mathbb{C}\), the Nullstellensatz tells us than a set of polynomials \(f_1,\dots, f_m \in \mathbb{K}[x]\) either has a common root in \(\mathbb{K}^d\) or there exists \(m\) polynomials \(q_1,\dots, q_m \in \mathbb{K}[x]\) such that \(1 = \sum_i q_i.f_i \) (in words, \(1\) is in the ideal generated by the \(f_i\)). Note that the polynomials \(q_i\) can be seen as a certificate that the system of polynomial equations \(f_1(x) = 0, \dots, f_m(x) = 0\) has no solution. Such a certificate can easily be checked in polynomial time and therefore the Nullstellensatz theorem naturally led to the definitions of several proofs systems such as the Nullstellensatz proof system or Polynomial calculus resolution (this latter one captures Gröbner basis computations). These proof systems can be used for refuting CNF-formulas for example.

While Farkas' lemma works over an ordered field such as the real numbers and deals with inequalities, Nullstellensatz handles more complicated objects (polynomials of any degree) but only applies to algebraically closed field and equations. A natural question is to know whether we can combine the best of the two worlds and construct similar certificates for polynomial equations and inequalities over the real field. Let me kill the suspense directly: it is possible and this is the purpose of the Positivstellensatz!

The Positivstellensatz says that a system of (real) polynomial equations and inequalities of the form

\[f_1(x) \geq 0, \dots, f_m(x) \geq 0, h_1(x) = 0, \dots, h_k(x) = 0\]

has no solution over \(\mathbb{R}^d\) if and only if there exists some polynomials \(q_I,r_j \in \mathbb{R}[x]\) such that the \(q_I\) are sums of squares and such that

\[-1 = \sum_{I \subset [m]} q_I.\prod_{i \in I} f_i + \sum_j r_j.h_j\]

(in words, we say that \(-1\) is in sum of the cone generated by the \(f_i\) and the ideal generated by the \(h_j\)).

This theorem plays an important role in polynomial optimization. Indeed, suppose you have a optimization problem of the form

\[(\star) \ \ \ \ \max f(x) \text{ subject to x} \in \mathcal{S} \]

where \(f\) is a polynomial and \(\mathcal{S}\) is an semi-algebraic set (i.e., defined by polynomial equations and inequalities over \(\mathbb{R}^d\)). Then, proving than \(\gamma\) is an upper bound on the optimal value of \((\star)\) is essentially equivalent to proving than

\[ \forall x \in \mathcal{S}, \gamma - f(x) > 0\]

which reduces to a Positivstellensatz instance (i.e., to find a certificate that the system \(\gamma - f(x) \leq 0\) has no solution in \(\mathcal{S}\)). Observe that we can encode very easily NP-hard problem as an optimization problem of the form \((\star)\). For example, the stable set problem in graph, which asks for the biggest set \(S\) of vertices of a graph \(G = (V,E)\), such that no two vertices in \(S\) are connected in \(G\), can be encoded in the following way: we have \(|V|\) variables \(x_i\) (one for each vertex), and we want to solve the following problem:

\[ max \sum_i x_i \text{ subject to } \forall (i,j) \in E, x_ix_j = 0, \forall i, x_i^2 = x_i\]