Convex Optimization Notes

4. Lecture 4: Convex Conjugates, and Marginal Duality🔗

Lecture 3 showed strong duality first in the polyhedral world of linear programming. Lecture 4 now passes from linear inequality certificates to the duality of convex functions and their epigraphs. The main geometric idea is that the conjugate records affine functions lying below f. For each \xi\in E^*, the affine function

x\mapsto \langle \xi,x\rangle-f^*(\xi)

is the tightest such affine lower bound with slope \xi. Therefore the biconjugate

f^{**}(x)=\sup_{\xi\in E^*}\{\langle \xi,x\rangle-f^*(\xi)\}

is the supremum of all affine functions lying below f. Geometrically, those affine lower bounds are exactly the affine halfspaces that contain \operatorname{epi}(f), so Fenchel--Moreau will be proved below from the fact that a closed convex set is the intersection of its supporting halfspaces.

4.1. Conjugates and Biconjugation🔗

Definition
4.1 Convex conjugate and biconjugate

Let f:E\to \mathbb{R}\cup\{+\infty\} be proper. Its convex conjugate is defined by

\forall \xi\in E^*,\qquad f^*(\xi):=\sup_{x\in E}\{\langle \xi,x\rangle-f(x)\}.

Under the present convention, this keeps f^* valued in \mathbb{R}\cup\{+\infty\}. If one allowed an improper function such as f\equiv +\infty, then the same formula would give f^*\equiv -\infty, which lies outside the current extended-value convention.

Let

\iota_E:E\to E^{**}, \qquad (\iota_E(x))(\xi):=\langle \xi,x\rangle.

Since E is finite-dimensional, \iota_E is an isomorphism. We therefore view the biconjugate back on E through this natural identification and write

f^{**}(x):=(f^*)^*(\iota_E(x)) = \sup_{\xi\in E^*}\{\langle \xi,x\rangle-f^*(\xi)\} \qquad (x\in E).

Definition
4.2 Closure and closed extended-value function

Let f:E\to \mathbb{R}\cup\{+\infty\}. The closure of f, denoted \operatorname{cl} f, is the function characterized by

\operatorname{epi}(\operatorname{cl} f)=\operatorname{cl}(\operatorname{epi}(f)).

We say that f is closed if \operatorname{epi}(f) is a closed subset of E\times \mathbb{R}, equivalently if \operatorname{cl} f=f. In finite dimensions, this is equivalent to lower semicontinuity.

Lemma
4.1 Fenchel--Young inequality and equality case

Let f:E\to \mathbb{R}\cup\{+\infty\} be proper. Then

\forall x\in E,\ \forall \xi\in E^*,\qquad f(x)+f^*(\xi)\ge \langle \xi,x\rangle.

If in addition f is closed and convex, then for every x\in E and every \xi\in E^*,

f(x)+f^*(\xi)=\langle \xi,x\rangle \iff \xi\in \partial f(x).

Proof

By definition of the conjugate,

f^*(\xi)=\sup_{z\in E}\{\langle \xi,z\rangle-f(z)\}\ge \langle \xi,x\rangle-f(x),

which rearranges to

f(x)+f^*(\xi)\ge \langle \xi,x\rangle.

Assume now that f is proper, closed, and convex. Equality holds if and only if

f^*(\xi)=\langle \xi,x\rangle-f(x).

By the definition of f^*, this is equivalent to

\forall z\in E,\qquad \langle \xi,z\rangle-f(z)\le \langle \xi,x\rangle-f(x),

that is,

\forall z\in E,\qquad f(z)\ge f(x)+\langle \xi,z-x\rangle.

This is exactly the statement that \xi\in \partial f(x).

Formal Statement and Proof

Lean theorems: Lecture04.lem_l4_fy, Lecture04.lem_l4_fy_eq_iff_subgradient.

theorem Lecture04_lem_l4_fy {E : Type*} [NormedAddCommGroup E] [NormedSpace E] {f : E EReal} (hf : IsProper f) (x : E) (ξ : Covector E) : f x + fenchelConjugate f ξ (((ξ x : ) : EReal)) := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace Ef:E ERealhf:IsProper fx:Eξ:Covector Ef x + fenchelConjugate f ξ (ξ x) All goals completed! 🐙 theorem Lecture04_lem_l4_fy_eq_iff_subgradient {E : Type*} [NormedAddCommGroup E] [NormedSpace E] {f : E EReal} (hf : IsProper f) {x : E} {ξ : Covector E} : f x + fenchelConjugate f ξ = (((ξ x : ) : EReal)) Lecture02.IsERealSubgradient f x ξ := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace Ef:E ERealhf:IsProper fx:Eξ:Covector Ef x + fenchelConjugate f ξ = (ξ x) Lecture02.IsERealSubgradient f x ξ All goals completed! 🐙

By definition, f^*(\xi) is the smallest shift that makes the affine function z\mapsto \langle \xi,z\rangle-f^*(\xi) lie below f for every z. Evaluating at z=x gives the inequality, and equality means that this affine lower bound touches f at x, which is exactly the subgradient condition.

Lemma
4.2

Let f:E\to \mathbb{R}\cup\{+\infty\} be proper. Then

\forall x\in E,\qquad f^{**}(x)\le f(x).

Proof

For every x\in E and every \xi\in E^*, Lemma 4.1 gives

\langle \xi,x\rangle-f^*(\xi)\le f(x).

Taking the supremum over \xi yields

f^{**}(x)\le f(x) \qquad \forall x\in E.

Formal Statement and Proof

Lean theorem: Lecture04_lem_l4_biconj_le.

theorem Lecture04_lem_l4_biconj_le {E : Type*} [NormedAddCommGroup E] [NormedSpace E] {f : E EReal} (hf : IsProper f) (x : E) : fenchelBiconjugate f x f x := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace Ef:E ERealhf:IsProper fx:EfenchelBiconjugate f x f x E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace Ef:E ERealhf:IsProper fx:E x_1, ((evalAtCovector x) x_1) - fenchelConjugate f x_1 f x E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace Ef:E ERealhf:IsProper fx:E (i : Covector E), ((evalAtCovector x) i) - fenchelConjugate f i f x E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace Ef:E ERealhf:IsProper fx:Eξ:Covector E((evalAtCovector x) ξ) - fenchelConjugate f ξ f x have hfy : (((evalAtCovector x ξ : ) : EReal)) f x + fenchelConjugate f ξ := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace Ef:E ERealhf:IsProper fx:Eξ:Covector E((evalAtCovector x) ξ) f x + fenchelConjugate f ξ All goals completed! 🐙 All goals completed! 🐙

This can be read as a weak-duality statement for biconjugation: f^{**} always gives a lower bound on f. Unlike Theorem 4.3, this direction does not require convexity or closedness.

Theorem
4.3 Fenchel--Moreau

Let E be a finite-dimensional real normed space, and let f:E\to \mathbb{R}\cup\{+\infty\} be proper, closed, and convex. Then

\forall x\in E,\qquad f^{**}(x)=f(x).

Proof

By Lemma 4.2, it remains to prove the reverse inequality

f(x)\le f^{**}(x) \qquad \forall x\in E.

Fix x_0\in E and t_0<f(x_0). Then (x_0,t_0)\notin \operatorname{epi}(f). Because f is proper, closed, and convex, the epigraph \operatorname{epi}(f) is a nonempty closed convex subset of E\times \mathbb{R}. By Theorem 2.6, there exists a closed halfspace containing \operatorname{epi}(f) but not (x_0,t_0).

Since \operatorname{epi}(f) is upward closed in the t-direction, we may take that halfspace in the form

H_{\xi,a}:=\{(x,t)\in E\times \mathbb{R}: t\ge \langle \xi,x\rangle-a\}

for some \xi\in E^* and a\in \mathbb{R}. Thus

\operatorname{epi}(f)\subseteq H_{\xi,a} \qquad\text{and}\qquad t_0<\langle \xi,x_0\rangle-a.

The containment \operatorname{epi}(f)\subseteq H_{\xi,a} is equivalent to saying that the affine function

\ell_{\xi,a}(x):=\langle \xi,x\rangle-a

is a global affine lower bound for f, that is,

\ell_{\xi,a}(x)\le f(x) \qquad \forall x\in E.

For fixed slope \xi, the smallest admissible intercept is exactly f^*(\xi), because

\ell_{\xi,a}(x)\le f(x)\ \forall x \iff a\ge \sup_{x\in E}\{\langle \xi,x\rangle-f(x)\}=f^*(\xi).

Hence every affine lower bound of slope \xi lies below the tight affine function of the same slope,

x\mapsto \langle \xi,x\rangle-f^*(\xi).

In particular,

t_0<\langle \xi,x_0\rangle-a\le \langle \xi,x_0\rangle-f^*(\xi)\le f^{**}(x_0).

Since this holds for every t_0<f(x_0), we conclude that

f(x_0)\le f^{**}(x_0).

Combined with Lemma 4.2, this gives

\forall x\in E,\qquad f^{**}(x)=f(x).

Formal Statement and Proof

Lean theorem: Lecture04.thm_l4_fm.

theorem Lecture04_thm_l4_fm {E : Type*} [NormedAddCommGroup E] [InnerProductSpace E] [FiniteDimensional E] (f : E EReal) (hf_proper : IsProper f) (hf_closed : LowerSemicontinuous f) (hf_convex : EConvexOn Set.univ f) : fenchelBiconjugate f = f := E:Type u_1inst✝²:NormedAddCommGroup Einst✝¹:InnerProductSpace Einst✝:FiniteDimensional Ef:E ERealhf_proper:IsProper fhf_closed:LowerSemicontinuous fhf_convex:EConvexOn Set.univ ffenchelBiconjugate f = f All goals completed! 🐙

More generally, if f is proper and convex, then

(\operatorname{cl} f)^*=f^*.

Applying Theorem 4.3 to \operatorname{cl} f therefore gives

f^{**}=(\operatorname{cl} f)^{**}=\operatorname{cl} f.

4.2. Basic Conjugate Pairs🔗

With the structural theorem in place, it is useful to collect a few conjugate pairs that will recur later in the course.

Example
4.1 A one-dimensional conjugate pair

Let p,q>1 satisfy \frac1p+\frac1q=1, and define f(x):=\frac{|x|^p}{p} on \mathbb{R}. Then f^*(\xi)=\frac{|\xi|^q}{q} for \xi\in \mathbb{R}. Thus

\left(\frac{|x|^p}{p}\right)^*=\frac{|\xi|^q}{q}.

This is the basic one-dimensional power-law conjugate pair. Applying Lemma 4.1 to this pair gives

x\xi\le \frac{|x|^p}{p}+\frac{|\xi|^q}{q}

for all x,\xi\in \mathbb{R}, and in particular

ab\le \frac{a^p}{p}+\frac{b^q}{q}

for a,b\ge 0. This is the classical Young inequality, which explains the Young part of the name Fenchel--Young inequality.

Proof

By definition,

f^*(\xi)=\sup_{x\in \mathbb{R}}\left\{x\xi-\frac{|x|^p}{p}\right\}.

The maximizer has the same sign as \xi, so it is enough to consider x\ge 0 and \xi\ge 0. For fixed \xi\ge 0, set \varphi_\xi(x):=x\xi-\frac{x^p}{p} on x\ge 0. Since \varphi_\xi'(x)=\xi-x^{p-1}, the unique critical point is x=\xi^{1/(p-1)}=\xi^{q-1}. Substituting gives

f^*(\xi) = \xi\cdot \xi^{q-1}-\frac{\xi^{p(q-1)}}{p} = \xi^q-\frac{\xi^q}{p} = \frac{\xi^q}{q}.

By symmetry, f^*(\xi)=\frac{|\xi|^q}{q} for all \xi\in \mathbb{R}. Applying Lemma 4.1 to this pair then gives the classical Young inequality stated above.

Definition
4.3 Indicator function

Let C\subseteq E. The indicator function of C is the function \delta_C:E\to \mathbb{R}\cup\{+\infty\} defined by

\delta_C(x):= \begin{cases} 0,&x\in C,\\ +\infty,&x\notin C. \end{cases}

Example
4.2 Indicator functions and support functions

Let C\subseteq E be nonempty, and define its support function by

\sigma_C(\xi):=\sup_{x\in C}\langle \xi,x\rangle \qquad (\xi\in E^*).

Then

(\delta_C)^*=\sigma_C.

In one dimension, if C=[-1,1]\subseteq \mathbb{R}, then

\sigma_C(\xi)=\sup_{x\in[-1,1]} \xi x=|\xi|.

Thus

(\delta_{[-1,1]})^*=|\cdot|.

Since |\cdot| is proper, closed, and convex, Theorem 4.3 then gives

|\cdot|^*=\delta_{[-1,1]}.

So the pair |x| and \delta_{[-1,1]} is the one-dimensional special case of the general indicator/support-function correspondence.

Example
4.3 Norms and dual unit balls

Let \|\cdot\| be a norm on E, and let \|\cdot\|_* be its dual norm on E^*, defined by

\|\xi\|_*:=\sup_{\|x\|\le 1}\langle \xi,x\rangle.

Then

\|\cdot\|^*=\delta_{B_*}, \qquad B_*:=\{\xi\in E^*:\|\xi\|_*\le 1\}.

Indeed, if \|\xi\|_*\le 1, then

\langle \xi,x\rangle-\|x\|\le \|\xi\|_*\|x\|-\|x\|\le 0 \qquad \forall x\in E,

so \|\cdot\|^*(\xi)=0. If \|\xi\|_*>1, choose x_0\in E such that \langle \xi,x_0\rangle>\|x_0\|; then

\langle \xi,tx_0\rangle-\|tx_0\| = t\bigl(\langle \xi,x_0\rangle-\|x_0\|\bigr)\to +\infty \qquad (t\to+\infty),

so \|\cdot\|^*(\xi)=+\infty.

Example
4.4 Exponential and negative entropy

Let f(x)=e^x on \mathbb{R}. Then

f^*(\xi)= \begin{cases} \xi\log \xi-\xi,&\xi\ge 0,\\ +\infty,&\xi<0, \end{cases}

with the convention 0\log 0:=0. This is the basic exponential/entropy conjugate pair.

The local Verso page contains a one-dimensional conjugate animation. The reader can edit a convex function f:\mathbb{R}\to\mathbb{R} directly while watching the supporting line and the conjugate graph update numerically.

4.3. Marginal Duality and Applications🔗

For subsets A,B\subseteq Y, write A-B:=\{a-b:a\in A,\ b\in B\}=A+(-B).

Theorem
4.4 Marginal duality / perturbation duality

Let X and U be finite-dimensional real vector spaces, and let

\Phi:X\times U\to \mathbb{R}\cup\{+\infty\}

be convex. Define the marginal value function of \Phi as

p:U\to \mathbb{R}\cup\{\pm\infty\}, \qquad p(u):=\inf_{x\in X}\Phi(x,u).

Here p may take the value -\infty, even though \Phi itself only takes values in \mathbb{R}\cup\{+\infty\}. Then the following hold.

  1. For every y\in U^*,

    p^*(y)=\Phi^*(0,y).

    Consequently, for every u\in U and every y\in U^*,

    p(u)\ge -\Phi^*(0,y)+\langle y,u\rangle.

    In particular,

    \sup_{y\in U^*}\{-\Phi^*(0,y)\}\le p(0).

  2. If p(0)=-\infty, then

    \sup_{y\in U^*}\{-\Phi^*(0,y)\}=-\infty.

  3. If p(0)>-\infty and \partial p(0)\neq\varnothing, then

    p(0)=\max_{y\in U^*}\{-\Phi^*(0,y)\}.

    In particular, strong duality and dual attainment hold.

  4. Define

    D:=\{u\in U:\exists x\in X\text{ such that }\Phi(x,u)<+\infty\}=\operatorname{dom} p.

    If p(0)\in \mathbb{R} and 0\in \operatorname{ri}(D), then

    p(0)=\max_{y\in U^*}\{-\Phi^*(0,y)\}.

Proof

Because \Phi is convex, its epigraph is a convex subset of X\times U\times \mathbb{R}. The epigraph of p is the projection

\operatorname{epi}(p)=\{(u,t)\in U\times \mathbb{R}:\exists x\in X\text{ with }\Phi(x,u)\le t\}.

Hence \operatorname{epi}(p) is convex, so p is convex.

Fix y\in U^*. By definition,

\begin{aligned} p^*(y) &=\sup_{u\in U}\{\langle y,u\rangle-p(u)\} \\ &=\sup_{u\in U}\sup_{x\in X}\{\langle y,u\rangle-\Phi(x,u)\} \\ &=\sup_{x\in X,\ u\in U}\{\langle 0,x\rangle+\langle y,u\rangle-\Phi(x,u)\} \\ &=\Phi^*(0,y). \end{aligned}

This proves the identity in part (1). For every u\in U, the definition of p^*(y) gives

p^*(y)=\sup_{v\in U}\{\langle y,v\rangle-p(v)\}\ge \langle y,u\rangle-p(u).

Using p^*(y)=\Phi^*(0,y), we obtain

p(u)+\Phi^*(0,y)\ge \langle y,u\rangle,

which is exactly

p(u)\ge -\Phi^*(0,y)+\langle y,u\rangle.

Assume now that p(0)=-\infty. For every M>0, choose x_M\in X such that

\Phi(x_M,0)\le -M.

Then, for every y\in U^*,

\Phi^*(0,y)\ge \langle 0,x_M\rangle+\langle y,0\rangle-\Phi(x_M,0)\ge M.

Since M is arbitrary, \Phi^*(0,y)=+\infty for every y, and hence

\sup_{y\in U^*}\{-\Phi^*(0,y)\}=-\infty.

This proves part (2).

Assume next that p(0)>-\infty and choose y\in \partial p(0). Then

\forall u\in U,\qquad p(u)\ge p(0)+\langle y,u\rangle.

Equivalently,

\forall u\in U,\qquad \langle y,u\rangle-p(u)\le -p(0).

Taking the supremum over u gives

p^*(y)\le -p(0).

On the other hand, evaluating at u=0 gives

p^*(y)\ge \langle y,0\rangle-p(0)=-p(0).

Hence

p^*(y)=-p(0).

Using part (1) of Theorem 4.4,

p(0)=-p^*(y)=-\Phi^*(0,y).

Combined with the weak-duality inequality from part (1), this yields

p(0)=\max_{v\in U^*}\{-\Phi^*(0,v)\},

with the maximum attained at the chosen y. This proves part (3).

Finally, define

D:=\{u\in U:\exists x\in X\text{ such that }\Phi(x,u)<+\infty\}.

If u\in D, then there exists x\in X such that \Phi(x,u)<+\infty, so

p(u)=\inf_{x'\in X}\Phi(x',u)\le \Phi(x,u)<+\infty,

hence u\in \operatorname{dom} p. Conversely, if u\in \operatorname{dom} p, then p(u)<+\infty, so not all values \Phi(x,u) can equal +\infty; therefore there exists x\in X such that \Phi(x,u)<+\infty, and hence u\in D. Thus

D=\operatorname{dom} p.

Assume now that p(0)\in \mathbb{R} and 0\in \operatorname{ri}(D)=\operatorname{ri}(\operatorname{dom} p). By convexity of p, it is enough to show that p never takes the value -\infty on \operatorname{dom} p. Fix u\in \operatorname{dom} p. Since 0\in \operatorname{ri}(\operatorname{dom} p), there exists \lambda\in(0,1) such that \lambda u\in \operatorname{dom} p. Then p(0) and p(\lambda u) are both finite, and convexity gives

p(\lambda u)\le \lambda p(u)+(1-\lambda)p(0).

Therefore p(u)>-\infty. Hence p:U\to \mathbb{R}\cup\{+\infty\} is proper and convex, with 0\in \operatorname{ri}(\operatorname{dom} p). By Theorem 2.10, we obtain \partial p(0)\neq\varnothing. Part (3) now applies and proves part (4).

Formal Statement and Proof

Lean theorems: Lecture04.thm_l4_marginal_duality_part1, Lecture04.thm_l4_marginal_duality_part2, Lecture04.thm_l4_marginal_duality_part3, Lecture04.thm_l4_marginal_duality.

theorem Lecture04_thm_l4_marginal_duality_part1 {X Y : Type*} [NormedAddCommGroup X] [NormedSpace X] [NormedAddCommGroup Y] [NormedSpace Y] (Φ : X × Y EReal) : ( y : Covector Y, fenchelConjugate (marginal (X := X) (Y := Y) Φ) y = fenchelConjugate Φ (prodSndCovector (X := X) y)) ( u : Y, y : Covector Y, affineMinus y (marginal (X := X) (Y := Y) Φ) u fenchelConjugate Φ (prodSndCovector (X := X) y)) marginalDualValue (X := X) (Y := Y) Φ marginal (X := X) (Y := Y) Φ (0 : Y) := X:Type u_1Y:Type u_2inst✝³:NormedAddCommGroup Xinst✝²:NormedSpace Xinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace YΦ:X × Y EReal(∀ (y : Covector Y), fenchelConjugate (marginal Φ) y = fenchelConjugate Φ (prodSndCovector y)) (∀ (u : Y) (y : Covector Y), affineMinus y (marginal Φ) u fenchelConjugate Φ (prodSndCovector y)) marginalDualValue Φ marginal Φ 0 All goals completed! 🐙 theorem Lecture04_thm_l4_marginal_duality_part2 {X Y : Type*} [NormedAddCommGroup X] [NormedSpace X] [NormedAddCommGroup Y] [NormedSpace Y] (Φ : X × Y EReal) (hprimal : marginal (X := X) (Y := Y) Φ (0 : Y) = ) : marginalDualValue (X := X) (Y := Y) Φ = := X:Type u_1Y:Type u_2inst✝³:NormedAddCommGroup Xinst✝²:NormedSpace Xinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace YΦ:X × Y ERealhprimal:marginal Φ 0 = marginalDualValue Φ = All goals completed! 🐙 theorem Lecture04_thm_l4_marginal_duality_part3 {X Y : Type*} [NormedAddCommGroup X] [NormedSpace X] [NormedAddCommGroup Y] [NormedSpace Y] (Φ : X × Y EReal) (hzero_bot : marginal (X := X) (Y := Y) Φ (0 : Y) ) (hzero_top : marginal (X := X) (Y := Y) Φ (0 : Y) ) {y : Covector Y} (hsub : Lecture02.IsERealSubgradient (marginal (X := X) (Y := Y) Φ) (0 : Y) y) : marginal (X := X) (Y := Y) Φ (0 : Y) = marginalDualValue (X := X) (Y := Y) Φ marginalDualValue (X := X) (Y := Y) Φ = marginalDualObjective (X := X) (Y := Y) Φ y := X:Type u_1Y:Type u_2inst✝³:NormedAddCommGroup Xinst✝²:NormedSpace Xinst✝¹:NormedAddCommGroup Yinst✝:NormedSpace YΦ:X × Y ERealhzero_bot:marginal Φ 0 hzero_top:marginal Φ 0 y:Covector Yhsub:Lecture02.IsERealSubgradient (marginal Φ) 0 ymarginal Φ 0 = marginalDualValue Φ marginalDualValue Φ = marginalDualObjective Φ y All goals completed! 🐙 theorem Lecture04_thm_l4_marginal_duality {X Y : Type*} [NormedAddCommGroup X] [NormedSpace X] [NormedAddCommGroup Y] [NormedSpace Y] [FiniteDimensional Y] (Φ : X × Y EReal) (hΦ_convex : EConvexOn Set.univ Φ) (hzero_bot : marginal (X := X) (Y := Y) Φ (0 : Y) ) (hri : (0 : Y) intrinsicInterior (effectiveDomain (marginal (X := X) (Y := Y) Φ))) : y : Covector Y, marginal (X := X) (Y := Y) Φ (0 : Y) = marginalDualValue (X := X) (Y := Y) Φ marginalDualValue (X := X) (Y := Y) Φ = marginalDualObjective (X := X) (Y := Y) Φ y := X:Type u_1Y:Type u_2inst✝⁴:NormedAddCommGroup Xinst✝³:NormedSpace Xinst✝²:NormedAddCommGroup Yinst✝¹:NormedSpace Yinst✝:FiniteDimensional YΦ:X × Y ERealhΦ_convex:EConvexOn Set.univ Φhzero_bot:marginal Φ 0 hri:0 intrinsicInterior (effectiveDomain (marginal Φ)) y, marginal Φ 0 = marginalDualValue Φ marginalDualValue Φ = marginalDualObjective Φ y All goals completed! 🐙

One especially useful specialization of Theorem 4.4 is the template

\inf_{x\in X}\{f(x)+g(Ax)\}.

We postpone that reduction to Exercise 4.1. The worked applications below instead compute the relevant perturbation conjugates directly from Theorem 4.4.

Example
4.5 Recovering the LP dual

Consider the linear program

\inf\{c^\top x:Ax\ge b,\ x\ge 0\},

with A\in \mathbb{R}^{m\times n}, b\in \mathbb{R}^m, and c\in \mathbb{R}^n. Define the perturbation

\Phi(x,u):=c^\top x+\delta_{\mathbb{R}_+^n}(x)+\delta_{\mathbb{R}_+^m}(Ax-b-u), \qquad (x,u)\in \mathbb{R}^n\times \mathbb{R}^m.

Then

p(u):=\inf_{x\in \mathbb{R}^n}\Phi(x,u) = \inf\{c^\top x:Ax\ge b+u,\ x\ge 0\}.

For y\in \mathbb{R}^m,

\Phi^*(0,y)= \begin{cases} -b^\top y,&A^\top y\le c,\ y\ge 0,\\ +\infty,&\text{otherwise}. \end{cases}

Therefore Theorem 4.4 recovers the dual formula

\sup\{b^\top y:A^\top y\le c,\ y\ge 0\} \le \inf\{c^\top x:Ax\ge b,\ x\ge 0\},

and, if 0\in \operatorname{ri}(\operatorname{dom} p) and the primal value is finite, it gives

\inf\{c^\top x:Ax\ge b,\ x\ge 0\} = \max\{b^\top y:A^\top y\le c,\ y\ge 0\}.

This recovers the usual LP dual from the general perturbation template. Lecture 3 proves a sharper strong-duality statement using polyhedral structure, without this extra relative-interior hypothesis.

Proof

The formula for p(u) is immediate from the definition of \Phi. For y\in \mathbb{R}^m,

\begin{aligned} \Phi^*(0,y) &=\sup_{x\in \mathbb{R}^n,\ u\in \mathbb{R}^m} \{y^\top u-c^\top x-\delta_{\mathbb{R}_+^n}(x)-\delta_{\mathbb{R}_+^m}(Ax-b-u)\}. \end{aligned}

If some coordinate of y is negative, then the constraint Ax-b-u\in \mathbb{R}_+^m only imposes an upper bound u\le Ax-b, so sending the corresponding coordinate of u to -\infty shows that \Phi^*(0,y)=+\infty. Hence we may restrict to y\ge 0. For such y, the supremum over u is attained at the largest feasible choice u=Ax-b, so

\Phi^*(0,y) =\sup_{x\ge 0}\{y^\top(Ax-b)-c^\top x\} =-b^\top y+\sup_{x\ge 0}\{x^\top(A^\top y-c)\}.

If A^\top y\le c, then x^\top(A^\top y-c)\le 0 for every x\ge 0, so the supremum is 0, attained at x=0. If instead some coordinate of A^\top y-c is positive, then scaling the corresponding basis vector shows that the supremum is +\infty. This proves the displayed formula for \Phi^*(0,y).

Part (1) of Theorem 4.4 then gives

\sup_{y\in \mathbb{R}^m}\{-\Phi^*(0,y)\}\le p(0),

which is exactly the weak-duality inequality

\sup\{b^\top y:A^\top y\le c,\ y\ge 0\} \le \inf\{c^\top x:Ax\ge b,\ x\ge 0\}.

If in addition 0\in \operatorname{ri}(\operatorname{dom} p) and p(0)\in \mathbb{R}, then part (4) of Theorem 4.4 yields the claimed equality and dual attainment.

Example
4.6 Norm regularization and dual-norm constraints

Let X and Y be finite-dimensional real normed spaces, let A:X\to Y be linear, let \lambda>0, and let f:X\to \mathbb{R}\cup\{+\infty\} be proper, closed, and convex. Consider

\inf_{x\in X}\{f(x)+\lambda\|Ax\|\}.

Then

\inf_{x\in X}\{f(x)+\lambda\|Ax\|\} = \max_{\substack{y\in Y^*\\ \|y\|_*\le \lambda}} \{-f^*(-A^*y)\}.

Proof

Set

g(u):=\lambda\|u\|, \qquad \Phi(x,u):=f(x)+g(Ax+u).

Define the marginal value function

p(u):=\inf_{x\in X}\Phi(x,u)=\inf_{x\in X}\{f(x)+\lambda\|Ax+u\|\}.

Because g is finite everywhere on Y, one has \operatorname{dom} g=Y. Since f is proper, there exists x_0\in X with f(x_0)<+\infty, so for every u\in Y,

p(u)\le f(x_0)+\lambda\|Ax_0+u\|<+\infty.

Therefore

\operatorname{dom} p=Y, \qquad 0\in \operatorname{ri}(\operatorname{dom} p).

For y\in Y^*,

\begin{aligned} \Phi^*(0,y) &=\sup_{x\in X,\ u\in Y}\{\langle y,u\rangle-f(x)-g(Ax+u)\} \\ &=\sup_{x\in X,\ z\in Y}\{\langle y,z-Ax\rangle-f(x)-g(z)\} \\ &=\sup_{x\in X}\{\langle -A^*y,x\rangle-f(x)\} +\sup_{z\in Y}\{\langle y,z\rangle-g(z)\} \\ &=f^*(-A^*y)+g^*(y). \end{aligned}

We now compute g^*. For y\in Y^*,

g^*(y)=\sup_{u\in Y}\{\langle y,u\rangle-\lambda\|u\|\}.

If \|y\|_*\le \lambda, then for every u\in Y,

\langle y,u\rangle\le \|y\|_*\|u\|\le \lambda\|u\|,

so

\langle y,u\rangle-\lambda\|u\|\le 0.

Taking u=0 shows that the supremum is exactly 0.

If instead \|y\|_*>\lambda, then by the definition of the dual norm there exists u_0\in Y such that

\langle y,u_0\rangle>\lambda\|u_0\|.

For every t>0,

\langle y,tu_0\rangle-\lambda\|tu_0\| = t\bigl(\langle y,u_0\rangle-\lambda\|u_0\|\bigr)\to +\infty.

Hence g^*(y)=+\infty. Therefore

g^*(y)= \begin{cases} 0,&\|y\|_*\le \lambda,\\ +\infty,&\|y\|_*>\lambda. \end{cases}

If p(0)=-\infty, then part (2) of Theorem 4.4 gives

\sup_{y\in Y^*}\{-\Phi^*(0,y)\}=-\infty.

Since 0\in Y^* and \|0\|_*=0, the feasible set \{y\in Y^*:\|y\|_*\le \lambda\} is nonempty, so the supremum is then a maximum, and the desired formula follows.

If instead p(0)>-\infty, then p(0)\in \mathbb{R} because p(0)<+\infty, and part (4) of Theorem 4.4 yields

p(0)=\max_{y\in Y^*}\{-\Phi^*(0,y)\}.

In this case, since p(0)=\inf_{x\in X}\{f(x)+\lambda\|Ax\|\}, substituting the formula for \Phi^*(0,y) and then the formula for g^*(y) yields

\inf_{x\in X}\{f(x)+\lambda\|Ax\|\} = \max_{\substack{y\in Y^*\\ \|y\|_*\le \lambda}} \{-f^*(-A^*y)\}.

4.4. Exercises🔗

  1. Exercise 4.1. Fenchel--Rockafellar specialization of Theorem 4.4.

    Let X and Y be finite-dimensional real vector spaces, let A:X\to Y be linear, and let

    f:X\to \mathbb{R}\cup\{+\infty\},\qquad g:Y\to \mathbb{R}\cup\{+\infty\}

    be proper, closed, and convex. Do the following.

    1. Define

      \Phi(x,u):=f(x)+g(Ax+u)

      and prove that \Phi is convex on X\times Y.

    2. Define

      p(u):=\inf_{x\in X}\Phi(x,u)=\inf_{x\in X}\{f(x)+g(Ax+u)\}.

      Show that

      p(0)=\inf_{x\in X}\{f(x)+g(Ax)\}.

    3. Prove that for every y\in Y^*,

      \Phi^*(0,y)=f^*(-A^*y)+g^*(y).

    4. Show that

      \operatorname{dom} p=\operatorname{dom} g-A(\operatorname{dom} f) =-\bigl(A(\operatorname{dom} f)-\operatorname{dom} g\bigr).

      Deduce that

      0\in \operatorname{ri}\bigl(A(\operatorname{dom} f)-\operatorname{dom} g\bigr) \iff 0\in \operatorname{ri}(\operatorname{dom} p).

    5. Use part (1) of Theorem 4.4 to prove

      \sup_{y\in Y^*}\{-f^*(-A^*y)-g^*(y)\} \le \inf_{x\in X}\{f(x)+g(Ax)\}.

    6. Assume additionally that

      0\in \operatorname{ri}\bigl(A(\operatorname{dom} f)-\operatorname{dom} g\bigr)

      and that the primal infimum is finite. Use part (4) of Theorem 4.4 to prove

      \inf_{x\in X}\{f(x)+g(Ax)\} = \max_{y\in Y^*}\{-f^*(-A^*y)-g^*(y)\}.

  2. Exercise 4.2. Dual norm twice.

    Let \|\cdot\| be a norm on E, and let \|\cdot\|_* be its dual norm on E^*. Prove that, under the natural identification E\simeq E^{**}, the dual norm of \|\cdot\|_* is the original norm:

    \|x\|_{**}=\|x\| \qquad \forall x\in E.