In this post, we propose two proofs of the Kantorovich inequality, one based on probability theory and the other one based on optimization theory.

Let $A \in \mathbb{R}^{n \times n}$ be a given symmetric positive definite matrix and denote its eigenvalues $\lambda_1 \le \lambda_2 \le \dots \le \lambda_n$. The Kantorovich inequality states that for all vector $x \in \mathbb{R}^n$ of unit norm, we have $$ \langle x, A x \rangle \langle x, A^{-1} x \rangle \le \frac{(\lambda_1 + \lambda_n)^2}{4 \lambda_1 \lambda_n}.$$ Such an inequality is widely used in convergence analysis (e.g., the convergence rate of the steepest descent method can be derived from the Kantorovich inequality).

Proof based on probability theory#

We first state three lemmas on which we will base our proof.

Lemma 1. Let $X$ and $Y$ be two jointly distributed real-valued random variables with finite second-order moments. Then $$\mathbb{E}[X] \mathbb{E}[Y] \le \mathbb{E}[XY] + \sqrt{\operatorname{Var}[X] \operatorname{Var}[Y]}.$$

Proof. This is a direct consequence of the Cauchy-Schwarz inequality applied to $\operatorname{Var}[X] \operatorname{Var}[Y]$.

Lemma 2. (Popoviciu’s inequality) Let $X$ be a real-valued random variable taking values in $[\alpha, \beta]$, with $\alpha \le \beta$. Then $$\operatorname{Var}[X] \le \frac{(\beta - \alpha)^2}{4}.$$

Proof. Let $\varphi$ be the univariate function defined by $\varphi(t) = \mathbb{E}[(X - t)^2]$. It easily reaches its minimum at $t = \mathbb{E}[X]$, so that $$\operatorname{Var}[X] = \varphi(\mathbb{E}[X]) \le \varphi \bigg(\frac{\alpha + \beta}{2}\bigg) = \frac{\mathbb{E}[(2X - \alpha - \beta)^2]}{4}.$$ Remarking that $X - \alpha \ge 0$ and $X - \beta \le 0$, we thus obtain the desired result.

Lemma 3. Let $X$ be a real-valued random variable taking values in $[\alpha, \beta]$, with $0 < \alpha \le \beta$. Then $$\mathbb{E}[X] \mathbb{E}[X^{-1}] \le \frac{(\alpha + \beta)^2}{4 \alpha \beta}.$$

Proof. Lemmas 1 and 2 together provide $$\mathbb{E}[X] \mathbb{E}[X^{-1}] \le \mathbb{E}[XX^{-1}] + \sqrt{\operatorname{Var}[X] \operatorname{Var}[X^{-1}]} \le 1 + \sqrt{\frac{(\beta - \alpha)^2 (\alpha^{-1} - \beta^{-1})^2}{16}} = \frac{(\alpha + \beta)^2}{4 \alpha \beta}.$$

We are now ready to prove the Kantorovich inequality. We let $x \in \mathbb{R}^n$ be any vector of unit norm. Since $A$ is symmetric positive definite, it admits an eigendecomposition $A = Q \Lambda Q^{\mathsf{T}}$ with $Q \in \mathbb{R}^{n \times n}$ an orthogonal matrix and $\Lambda = \operatorname{diag} \{ \lambda_1, \dots, \lambda_n \}$. We then have $$\langle x, A x \rangle = \sum_{i = 1}^n \lambda_i v_i^2 \quad \text{and} \quad \langle x, A^{-1} x \rangle = \sum_{i = 1}^n \lambda_i^{-1} v_i^2,$$ where $v_i$ denotes the $i$th component of $Q^{\mathsf{T}} x$. Let $X$ the any random variable that takes the value $\lambda_i$ with probability $v_i^2$ for $i \in \{ 1, \dots, n \}$. Since $X$ takes its values in $[\lambda_1, \lambda_n]$, according to Lemma 3, we have $$\langle x, A x \rangle \langle x, A^{-1} x \rangle = \mathbb{E}[X] \mathbb{E}[X^{-1}] \le \frac{(\lambda_1 + \lambda_n)^2}{4 \lambda_1 \lambda_n},$$ which concludes our proof of the Kantorovich inequality based on probability theory.

Proof based on optimization theory#

We let $x \in \mathbb{R}^n$ be any vector of unit norm and $\varphi$ be the univariate function defined on $(0, \infty)$ by $\varphi(t) = t \langle x, A x \rangle + t^{-1} \langle x, A^{-1} x \rangle$. The minimum of $\varphi$ is reached at $t = \sqrt{\langle x, A^{-1} x \rangle \langle x, A x \rangle^{-1}}$, providing $$\sqrt{\langle x, A x \rangle \langle x, A^{-1} x \rangle} \le \frac{1}{2} \langle x, (t A + t^{-1} A^{-1}) x \rangle.$$ According to the Courant-Fischer theorem, we have $$\sqrt{\langle x, A x \rangle \langle x, A^{-1} x \rangle} \le \frac{1}{2} \max_{i = 1, 2, \dots, n} (t \lambda_i + t^{-1} \lambda_i^{-1}) = \frac{1}{2} \max \{ t \lambda_1 + t^{-1} \lambda_1^{-1}, t \lambda_n + t^{-1} \lambda_n^{-1} \}.$$ The evaluation of the above right-hand side at $t = 1 / \sqrt{\lambda_1 \lambda_n}$ provides $$ \sqrt{\langle x, A x \rangle \langle x, A^{-1} x \rangle} \le \frac{\lambda_1 + \lambda_n}{2 \sqrt{\lambda_1 \lambda_n}},$$ which concludes our proof of the Kantorovich inequality based on optimization theory.