1. Lecture 1: Introduction and Convexity
In a high-level, optimization packages a decision task into three objects: a
variable x, a feasible set \Omega, and an objective f. The feasible
set records what is allowed; the objective records what is preferred. This
language is broad enough to cover resource allocation, profit maximization,
training a machine learning model, maximum-likelihood estimation, control, and
minimum-energy problems. Once a model is written this way, the basic question
becomes: among all feasible choices, which one has the smallest cost? In this
course, we focus on the continuous optimization problems where the domain E
is a finite-dimensional real space.
Let f : E \to \mathbb{R} and let \Omega \subseteq E be nonempty. The
optimization problem associated with (f,\Omega) is
p^\star := \inf\{f(x) : x \in \Omega\}.
A point x^\star \in \Omega is a global minimizer if
f(x^\star) = p^\star, i.e.,
x^\star = \arg\min_{x \in \Omega} f(x).
Let (f,\Omega) be an optimization problem, and let
p^\star := \inf\{f(x) : x \in \Omega\}.
For every \varepsilon > 0, a point \widehat x \in \Omega is called
\varepsilon-optimal if
f(\widehat x) \le p^\star + \varepsilon.
Basic outcomes of an optimization problem. Even at this level, several different things can happen. The feasible set may be empty, the infimum may be finite but not attained, or the problem may be unbounded below. This is one reason a near-optimal solution concept is introduced: exact minimizers may fail to exist, and even when they do exist they may be harder to compute than near-optimal points. Before discussing certificates of optimality, it is therefore natural to ask a more basic question: when does an exact minimizer exist at all? The next theorem gives the standard topological answer under compactness and continuity.
Let \Omega \subseteq E be nonempty and compact, and let
f : \Omega \to \mathbb{R} be continuous. Then there exists
x^\star \in \Omega such that
f(x^\star) = \min_{x \in \Omega} f(x).
Proof
Because \Omega is compact and f is continuous on \Omega, the image
set f(\Omega) \subseteq \mathbb{R} is compact. In particular, f(\Omega)
is nonempty and closed and bounded below, so it contains its infimum. Choose
m \in f(\Omega) such that
m = \inf f(\Omega).
Then there exists x^\star \in \Omega with f(x^\star) = m. For every
x \in \Omega one has f(x) \in f(\Omega) and therefore m \le f(x).
Hence
f(x^\star) = m = \min_{x \in \Omega} f(x).
Formal Statement and Proof
Lean theorem: Lecture01.thm_l1_weier.
theorem proof_of_Lecture01_thm_l1_weier {E : Type*} [TopologicalSpace E]
{Ω : Set E} {f : E → ℝ}
(hΩ_compact : IsCompact Ω)
(hΩ_nonempty : Ω.Nonempty)
(hf : ContinuousOn f Ω) :
∃ xStar ∈ Ω, IsMinOn f Ω xStar := E:Type u_1inst✝:TopologicalSpace EΩ:Set Ef:E → ℝhΩ_compact:IsCompact ΩhΩ_nonempty:Ω.Nonemptyhf:ContinuousOn f Ω⊢ ∃ xStar ∈ Ω, IsMinOn f Ω xStar
All goals completed! 🐙
Specification versus computation. An optimization specification tells us
what the mathematical problem is: it determines the feasible set, the
objective, the optimal value, and the solution notions we care about, such as
exact minimizers or \varepsilon-optimal points. But this still does not
determine a computational problem. To speak about algorithms and complexity, we
must also specify how the instance is presented, which primitive operations
are available, what cost model is being counted, and what kind of output is
required.
For finitely described models such as linear and quadratic programs, the input is a finite list of numbers and one counts arithmetic or bit operations. For large language models pretraining in practice, the input is a huge text corpus and code for model architecture, and one measures the total training wall-clock time on a GPU cluster. The same mathematical specification can therefore lead to very different computational questions under different access models. For example, the computational question will be very different for LLM pretraining if the GPU cluster is not owned but rented and the cost is actually the rental cost rather than the wall-clock time. Even for simple problems that admit closed-form solutions, the computational question will be very different if our unit-cost arithmetic operations are finite-precision or infinite-precision. There are indeed lots of algorithms that are preferred because of their numerical stability.
For the purpose of this course, we will mostly focus on an abstract and therefore general setting, where we assume the query oracle about local information, such as the value, gradient, or Hessian of the objective at each point, is the relatively cheap primitive. Throughout the course we assume we can work with real numbers with infinite precision.
Let U \subseteq E be open and let f : U \to \mathbb{R}. We say that
f is differentiable on U if it is differentiable at every point of
U. In that case the first-order object at x \in U is the differential
\nabla f(x) := Df(x) \in E^\ast, \qquad x \in U,
and expressions such as \langle \nabla f(x), h \rangle are to be read as
dual pairings.
Throughout this lecture, let E be a finite-dimensional real normed space
and let E^\ast be its dual. We write
\langle \xi, h \rangle, \qquad \xi \in E^\ast,\ h \in E,
for the dual pairing. If a genuine inner product is being used, we will say so
explicitly or decorate the notation, for instance by writing
\langle u, v \rangle_H or \langle u, v \rangle_2.
A set C \subseteq E is convex if
\forall x,y \in C,\ \forall \theta \in [0,1],\qquad
\theta x + (1-\theta)y \in C.
Let C \subseteq E be convex and let f : C \to \mathbb{R}. The function
f is convex if
\forall x,y \in C,\ \forall \theta \in [0,1],\qquad
f(\theta x + (1-\theta)y) \le \theta f(x) + (1-\theta)f(y).
It is strictly convex if
\forall x,y \in C \text{ with } x \ne y,\ \forall \theta \in (0,1),\qquad
f(\theta x + (1-\theta)y) < \theta f(x) + (1-\theta)f(y).
1.1. Why Convexity?
The point of studying convex optimization is not that all realistic optimization problems are convex. Rather, convexity is a good first structural assumption because it is strong enough to yield real global theorems, but not so strong that the subject collapses into a few toy examples.
-
Convexity is strong enough to make local information globally meaningful. Without additional structure, local information is usually only local. A derivative, an affine approximation, or a second-order expansion near one point says very little about what happens elsewhere. Convexity is the first major structural assumption in the course under which this changes in a robust way. Gradients, subgradients, supporting hyperplanes, and dual certificates stop being merely local descriptions and start becoming global lower certificates.
-
Convexity is broad and stable enough to support a systematic theory. A useful structural assumption should not apply only to a tiny family of specially engineered problems. Convexity still contains linear programs, least squares, logistic regression, constrained quadratic programs, second-order cone programs, semidefinite programs, and many regularized learning models. It is also stable under the operations optimization repeatedly uses, such as nonnegative linear combinations, affine changes of variables, epigraph constructions, partial minimization, and conjugation. Because of that, convex optimization can be developed as a real theory rather than a disconnected collection of tricks.
-
Convex optimization is the right baseline testbed for algorithms and complexity. Even when the eventual application is nonconvex, convex optimization is the cleanest setting in which one can first isolate the role of local primitives, prove global guarantees, and identify genuine complexity barriers. If a method cannot even be explained or stabilized on convex problems, then its behavior on more complicated problems is harder to interpret, not easier.
-
Convex optimization is also a source of ideas that survive beyond the convex setting. Its value is not limited to problems that are themselves convex. Many methods and viewpoints that later matter more broadly were first discovered, justified, or conceptually clarified in convex and online convex optimization. A concrete example is modern LLM training: the objective is highly nonconvex, yet the default optimizer AdamW belongs to a line of adaptive first-order methods whose ancestry runs through AdaGrad, a method that emerged from theoretical work in online convex optimization.
1.2. First Consequences of Convexity
The next results are the first concrete consequences of convexity. For convex functions, local optimality is already global. On a convex feasible set, differentiability yields a first-order necessary condition for minimizers. Once convexity is added, that same first-order sign condition becomes not only necessary but also sufficient for global optimality.
Let \Omega \subseteq E be nonempty and convex, let
f : \Omega \to \mathbb{R} be convex, and let x^\star \in \Omega. If
there exists r > 0 such that
\forall x \in \Omega \cap \{y \in E : \|y - x^\star\| < r\},\qquad
f(x^\star) \le f(x),
then
\forall x \in \Omega,\qquad f(x^\star) \le f(x).
Proof
Assume for contradiction that there exists x \in \Omega such that
f(x) < f(x^\star).
Because \Omega is convex, for every \theta \in (0,1) the point
x_\theta := \theta x + (1-\theta)x^\star
belongs to \Omega. By convexity of f,
f(x_\theta) \le \theta f(x) + (1-\theta)f(x^\star) < f(x^\star).
Moreover,
\|x_\theta - x^\star\| = \theta \|x - x^\star\|.
Choosing \theta > 0 sufficiently small makes
\|x_\theta - x^\star\| < r. Then
x_\theta \in \Omega \cap \{y \in E : \|y - x^\star\| < r\} and
f(x_\theta) < f(x^\star),
contradicting the assumed local minimality of x^\star. Therefore no such
x exists, and so
\forall x \in \Omega,\qquad f(x^\star) \le f(x).
Formal Statement and Proof
Lean theorem: Lecture01.thm_l1_local_global.
theorem proof_of_Lecture01_thm_l1_local_global {E : Type*}
[NormedAddCommGroup E] [NormedSpace ℝ E]
{Ω : Set E} {f : E → ℝ} {xStar : E}
(hxStar : xStar ∈ Ω) (hconv : ConvexOn ℝ Ω f)
(hlocal :
∃ r > 0,
∀ ⦃x : E⦄,
x ∈ Ω →
‖x - xStar‖ < r →
f xStar ≤ f x) :
∀ x ∈ Ω,
f xStar ≤ f x := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhxStar:xStar ∈ Ωhconv:ConvexOn ℝ Ω fhlocal:∃ r > 0, ∀ ⦃x : E⦄, x ∈ Ω → ‖x - xStar‖ < r → f xStar ≤ f x⊢ ∀ x ∈ Ω, f xStar ≤ f x
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhxStar:xStar ∈ Ωhconv:ConvexOn ℝ Ω fr:ℝhr_pos:r > 0hlocal:∀ ⦃x : E⦄, x ∈ Ω → ‖x - xStar‖ < r → f xStar ≤ f x⊢ ∀ x ∈ Ω, f xStar ≤ f x
have hlocalMin : IsLocalMinOn f Ω xStar := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhxStar:xStar ∈ Ωhconv:ConvexOn ℝ Ω fr:ℝhr_pos:r > 0hlocal:∀ ⦃x : E⦄, x ∈ Ω → ‖x - xStar‖ < r → f xStar ≤ f x⊢ IsLocalMinOn f Ω xStar
filter_upwards
[inter_mem_nhdsWithin Ω (Metric.ball_mem_nhds xStar hr_pos)] with x E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhxStar:xStar ∈ Ωhconv:ConvexOn ℝ Ω fr:ℝhr_pos:r > 0hlocal:∀ ⦃x : E⦄, x ∈ Ω → ‖x - xStar‖ < r → f xStar ≤ f xx:Ehx:x ∈ Ω ∩ Metric.ball xStar r⊢ f xStar ≤ f x
exact hlocal hx.1 (E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhxStar:xStar ∈ Ωhconv:ConvexOn ℝ Ω fr:ℝhr_pos:r > 0hlocal:∀ ⦃x : E⦄, x ∈ Ω → ‖x - xStar‖ < r → f xStar ≤ f xx:Ehx:x ∈ Ω ∩ Metric.ball xStar r⊢ ‖x - xStar‖ < r All goals completed! 🐙)
All goals completed! 🐙
Let \Omega \subseteq E be nonempty and convex, let
f : E \to \mathbb{R} be differentiable, and let
x^\star \in \Omega be a global minimizer of f over \Omega. Then
\forall x \in \Omega,\qquad
\langle \nabla f(x^\star), x - x^\star \rangle \ge 0.
Proof
Fix any x \in \Omega and define
\phi(t) := f\bigl(x^\star + t(x-x^\star)\bigr),
\qquad t \in [0,1].
Because \Omega is convex, one has
x^\star + t(x-x^\star) \in \Omega for every t \in [0,1]. Since
x^\star is a global minimizer of f over \Omega,
\phi(t) \ge \phi(0)
\qquad \forall t \in [0,1].
Hence the right derivative of \phi at 0 is nonnegative:
\phi'_+(0) \ge 0.
Because f is differentiable,
\phi'_+(0) = \langle \nabla f(x^\star), x-x^\star \rangle.
Therefore
\langle \nabla f(x^\star), x-x^\star \rangle \ge 0.
Since x \in \Omega was arbitrary, the claim follows.
Formal Statement and Proof
Lean theorem: Lecture01.lem_l1_first_order_necessary.
theorem proof_of_Lecture01_lem_l1_first_order_necessary {E : Type*}
[NormedAddCommGroup E] [NormedSpace ℝ E]
{Ω : Set E} {f : E → ℝ} {xStar : E}
(hΩ_convex : Convex ℝ Ω) (hxStar : xStar ∈ Ω)
(hmin : ∀ y ∈ Ω, f xStar ≤ f y)
(hf : Differentiable ℝ f) :
∀ x ∈ Ω, 0 ≤ fderiv ℝ f xStar (x - xStar) := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ f⊢ ∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar)
intro x E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ω⊢ 0 ≤ (fderiv ℝ f xStar) (x - xStar)
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)⊢ 0 ≤ (fderiv ℝ f xStar) (x - xStar)
have hmaps : Set.MapsTo (AffineMap.lineMap xStar x) (Set.Icc (0 : ℝ) 1) Ω := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)⊢ Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω
All goals completed! 🐙
have hφmin : IsMinOn φ (Set.Icc (0 : ℝ) 1) 0 := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))⊢ IsMinOn φ (Set.Icc 0 1) 0
intro t E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))t:ℝht:t ∈ Set.Icc 0 1⊢ t ∈ {x | (fun x => φ 0 ≤ φ x) x}
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))t:ℝht:t ∈ Set.Icc 0 1hmin_t:f xStar ≤ f ((AffineMap.lineMap xStar x) t) := hmin ((AffineMap.lineMap xStar x) t) (hmaps ht)⊢ t ∈ {x | (fun x => φ 0 ≤ φ x) x}
All goals completed! 🐙
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))hφmin:IsMinOn φ (Set.Icc 0 1) 0 :=
fun ⦃t⦄ ht =>
have hmin_t := hmin ((AffineMap.lineMap xStar x) t) (hmaps ht);
Eq.mpr
(id
(Eq.trans
(congrFun'
(congrArg Membership.mem
(congrArg setOf
(funext fun x_1 =>
congrFun' (congrArg LE.le (congrArg f (AffineMap.lineMap_apply_zero xStar x)))
(f ((AffineMap.lineMap xStar x) x_1)))))
t)
ge_iff_le._simp_1))
hmin_thφlocal:IsLocalMinOn φ (Set.Icc 0 1) 0 := IsMinOn.localize hφmin⊢ 0 ≤ (fderiv ℝ f xStar) (x - xStar)
have hfAt0 :
HasFDerivAt f (fderiv ℝ f xStar) ((AffineMap.lineMap xStar x) (0 : ℝ)) := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))hφmin:IsMinOn φ (Set.Icc 0 1) 0 :=
fun ⦃t⦄ ht =>
have hmin_t := hmin ((AffineMap.lineMap xStar x) t) (hmaps ht);
Eq.mpr
(id
(Eq.trans
(congrFun'
(congrArg Membership.mem
(congrArg setOf
(funext fun x_1 =>
congrFun' (congrArg LE.le (congrArg f (AffineMap.lineMap_apply_zero xStar x)))
(f ((AffineMap.lineMap xStar x) x_1)))))
t)
ge_iff_le._simp_1))
hmin_thφlocal:IsLocalMinOn φ (Set.Icc 0 1) 0 := IsMinOn.localize hφmin⊢ HasFDerivAt f (fderiv ℝ f xStar) ((AffineMap.lineMap xStar x) 0)
All goals completed! 🐙
have hlineDerivAt :
HasDerivAt φ ((fderiv ℝ f xStar) (x - xStar)) 0 := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))hφmin:IsMinOn φ (Set.Icc 0 1) 0 :=
fun ⦃t⦄ ht =>
have hmin_t := hmin ((AffineMap.lineMap xStar x) t) (hmaps ht);
Eq.mpr
(id
(Eq.trans
(congrFun'
(congrArg Membership.mem
(congrArg setOf
(funext fun x_1 =>
congrFun' (congrArg LE.le (congrArg f (AffineMap.lineMap_apply_zero xStar x)))
(f ((AffineMap.lineMap xStar x) x_1)))))
t)
ge_iff_le._simp_1))
hmin_thφlocal:IsLocalMinOn φ (Set.Icc 0 1) 0 := IsMinOn.localize hφminhfAt0:HasFDerivAt f (fderiv ℝ f xStar) ((AffineMap.lineMap xStar x) 0) :=
Eq.mpr (id (congrArg (HasFDerivAt f (fderiv ℝ f xStar)) (AffineMap.lineMap_apply_zero xStar x)))
(DifferentiableAt.hasFDerivAt (hf xStar))⊢ HasDerivAt φ ((fderiv ℝ f xStar) (x - xStar)) 0
All goals completed! 🐙
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))hφmin:IsMinOn φ (Set.Icc 0 1) 0 :=
fun ⦃t⦄ ht =>
have hmin_t := hmin ((AffineMap.lineMap xStar x) t) (hmaps ht);
Eq.mpr
(id
(Eq.trans
(congrFun'
(congrArg Membership.mem
(congrArg setOf
(funext fun x_1 =>
congrFun' (congrArg LE.le (congrArg f (AffineMap.lineMap_apply_zero xStar x)))
(f ((AffineMap.lineMap xStar x) x_1)))))
t)
ge_iff_le._simp_1))
hmin_thφlocal:IsLocalMinOn φ (Set.Icc 0 1) 0 := IsMinOn.localize hφminhfAt0:HasFDerivAt f (fderiv ℝ f xStar) ((AffineMap.lineMap xStar x) 0) :=
Eq.mpr (id (congrArg (HasFDerivAt f (fderiv ℝ f xStar)) (AffineMap.lineMap_apply_zero xStar x)))
(DifferentiableAt.hasFDerivAt (hf xStar))hlineDerivAt:HasDerivAt φ ((fderiv ℝ f xStar) (x - xStar)) 0 :=
Eq.mpr
(id
(HasDerivAt.congr_simp φ (f ∘ ⇑(AffineMap.lineMap xStar x)) (Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x)))
((fderiv ℝ f xStar) (x - xStar)) ((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar)
(map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0)))
(Eq.mp
(HasDerivAt.congr_simp (f ∘ ⇑(AffineMap.lineMap xStar x)) (f ∘ ⇑(AffineMap.lineMap xStar x))
(Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x))) ((fderiv ℝ f xStar) (x - xStar))
((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar) (map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0))
(HasFDerivAt.comp_hasDerivAt 0 hfAt0 AffineMap.hasDerivAt_lineMap))hlineDeriv:HasDerivWithinAt φ ((fderiv ℝ f xStar) (x - xStar)) (Set.Icc 0 1) 0 := HasDerivAt.hasDerivWithinAt hlineDerivAt⊢ 0 ≤ (fderiv ℝ f xStar) (x - xStar)
have hdir1 : (1 : ℝ) ∈ posTangentConeAt (Set.Icc (0 : ℝ) 1) (0 : ℝ) := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))hφmin:IsMinOn φ (Set.Icc 0 1) 0 :=
fun ⦃t⦄ ht =>
have hmin_t := hmin ((AffineMap.lineMap xStar x) t) (hmaps ht);
Eq.mpr
(id
(Eq.trans
(congrFun'
(congrArg Membership.mem
(congrArg setOf
(funext fun x_1 =>
congrFun' (congrArg LE.le (congrArg f (AffineMap.lineMap_apply_zero xStar x)))
(f ((AffineMap.lineMap xStar x) x_1)))))
t)
ge_iff_le._simp_1))
hmin_thφlocal:IsLocalMinOn φ (Set.Icc 0 1) 0 := IsMinOn.localize hφminhfAt0:HasFDerivAt f (fderiv ℝ f xStar) ((AffineMap.lineMap xStar x) 0) :=
Eq.mpr (id (congrArg (HasFDerivAt f (fderiv ℝ f xStar)) (AffineMap.lineMap_apply_zero xStar x)))
(DifferentiableAt.hasFDerivAt (hf xStar))hlineDerivAt:HasDerivAt φ ((fderiv ℝ f xStar) (x - xStar)) 0 :=
Eq.mpr
(id
(HasDerivAt.congr_simp φ (f ∘ ⇑(AffineMap.lineMap xStar x)) (Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x)))
((fderiv ℝ f xStar) (x - xStar)) ((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar)
(map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0)))
(Eq.mp
(HasDerivAt.congr_simp (f ∘ ⇑(AffineMap.lineMap xStar x)) (f ∘ ⇑(AffineMap.lineMap xStar x))
(Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x))) ((fderiv ℝ f xStar) (x - xStar))
((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar) (map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0))
(HasFDerivAt.comp_hasDerivAt 0 hfAt0 AffineMap.hasDerivAt_lineMap))hlineDeriv:HasDerivWithinAt φ ((fderiv ℝ f xStar) (x - xStar)) (Set.Icc 0 1) 0 := HasDerivAt.hasDerivWithinAt hlineDerivAt⊢ 1 ∈ posTangentConeAt (Set.Icc 0 1) 0
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))hφmin:IsMinOn φ (Set.Icc 0 1) 0 :=
fun ⦃t⦄ ht =>
have hmin_t := hmin ((AffineMap.lineMap xStar x) t) (hmaps ht);
Eq.mpr
(id
(Eq.trans
(congrFun'
(congrArg Membership.mem
(congrArg setOf
(funext fun x_1 =>
congrFun' (congrArg LE.le (congrArg f (AffineMap.lineMap_apply_zero xStar x)))
(f ((AffineMap.lineMap xStar x) x_1)))))
t)
ge_iff_le._simp_1))
hmin_thφlocal:IsLocalMinOn φ (Set.Icc 0 1) 0 := IsMinOn.localize hφminhfAt0:HasFDerivAt f (fderiv ℝ f xStar) ((AffineMap.lineMap xStar x) 0) :=
Eq.mpr (id (congrArg (HasFDerivAt f (fderiv ℝ f xStar)) (AffineMap.lineMap_apply_zero xStar x)))
(DifferentiableAt.hasFDerivAt (hf xStar))hlineDerivAt:HasDerivAt φ ((fderiv ℝ f xStar) (x - xStar)) 0 :=
Eq.mpr
(id
(HasDerivAt.congr_simp φ (f ∘ ⇑(AffineMap.lineMap xStar x)) (Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x)))
((fderiv ℝ f xStar) (x - xStar)) ((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar)
(map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0)))
(Eq.mp
(HasDerivAt.congr_simp (f ∘ ⇑(AffineMap.lineMap xStar x)) (f ∘ ⇑(AffineMap.lineMap xStar x))
(Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x))) ((fderiv ℝ f xStar) (x - xStar))
((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar) (map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0))
(HasFDerivAt.comp_hasDerivAt 0 hfAt0 AffineMap.hasDerivAt_lineMap))hlineDeriv:HasDerivWithinAt φ ((fderiv ℝ f xStar) (x - xStar)) (Set.Icc 0 1) 0 := HasDerivAt.hasDerivWithinAt hlineDerivAthseg01:segment ℝ 0 1 ⊆ Set.Icc 0 1 := Convex.segment_subset (convex_Icc 0 1) (Set.left_mem_Icc.mpr zero_le_one) (Set.right_mem_Icc.mpr zero_le_one)⊢ 1 ∈ posTangentConeAt (Set.Icc 0 1) 0
simpa using
(mem_posTangentConeAt_of_segment_subset
(x := (0 : ℝ)) (y := (1 : ℝ))
(show segment ℝ (0 : ℝ) ((0 : ℝ) + 1) ⊆ Set.Icc (0 : ℝ) 1 E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))hφmin:IsMinOn φ (Set.Icc 0 1) 0 :=
fun ⦃t⦄ ht =>
have hmin_t := hmin ((AffineMap.lineMap xStar x) t) (hmaps ht);
Eq.mpr
(id
(Eq.trans
(congrFun'
(congrArg Membership.mem
(congrArg setOf
(funext fun x_1 =>
congrFun' (congrArg LE.le (congrArg f (AffineMap.lineMap_apply_zero xStar x)))
(f ((AffineMap.lineMap xStar x) x_1)))))
t)
ge_iff_le._simp_1))
hmin_thφlocal:IsLocalMinOn φ (Set.Icc 0 1) 0 := IsMinOn.localize hφminhfAt0:HasFDerivAt f (fderiv ℝ f xStar) ((AffineMap.lineMap xStar x) 0) :=
Eq.mpr (id (congrArg (HasFDerivAt f (fderiv ℝ f xStar)) (AffineMap.lineMap_apply_zero xStar x)))
(DifferentiableAt.hasFDerivAt (hf xStar))hlineDerivAt:HasDerivAt φ ((fderiv ℝ f xStar) (x - xStar)) 0 :=
Eq.mpr
(id
(HasDerivAt.congr_simp φ (f ∘ ⇑(AffineMap.lineMap xStar x)) (Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x)))
((fderiv ℝ f xStar) (x - xStar)) ((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar)
(map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0)))
(Eq.mp
(HasDerivAt.congr_simp (f ∘ ⇑(AffineMap.lineMap xStar x)) (f ∘ ⇑(AffineMap.lineMap xStar x))
(Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x))) ((fderiv ℝ f xStar) (x - xStar))
((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar) (map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0))
(HasFDerivAt.comp_hasDerivAt 0 hfAt0 AffineMap.hasDerivAt_lineMap))hlineDeriv:HasDerivWithinAt φ ((fderiv ℝ f xStar) (x - xStar)) (Set.Icc 0 1) 0 := HasDerivAt.hasDerivWithinAt hlineDerivAthseg01:segment ℝ 0 1 ⊆ Set.Icc 0 1 := Convex.segment_subset (convex_Icc 0 1) (Set.left_mem_Icc.mpr zero_le_one) (Set.right_mem_Icc.mpr zero_le_one)⊢ segment ℝ 0 (0 + 1) ⊆ Set.Icc 0 1 All goals completed! 🐙))
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:EhΩ_convex:Convex ℝ ΩhxStar:xStar ∈ Ωhmin:∀ y ∈ Ω, f xStar ≤ f yhf:Differentiable ℝ fx:Ehx:x ∈ Ωφ:ℝ → ℝ := f ∘ ⇑(AffineMap.lineMap xStar x)hmaps:Set.MapsTo (⇑(AffineMap.lineMap xStar x)) (Set.Icc 0 1) Ω :=
Eq.mpr (id (Eq.trans proof_of_Lecture01_lem_l1_first_order_necessary._simp_1 Set.image_subset_iff._simp_1))
(Eq.mp (Eq.trans (congrFun' (congrArg Subset (segment_eq_image_lineMap ℝ xStar x)) Ω) Set.image_subset_iff._simp_1)
(Convex.segment_subset hΩ_convex hxStar hx))hφmin:IsMinOn φ (Set.Icc 0 1) 0 :=
fun ⦃t⦄ ht =>
have hmin_t := hmin ((AffineMap.lineMap xStar x) t) (hmaps ht);
Eq.mpr
(id
(Eq.trans
(congrFun'
(congrArg Membership.mem
(congrArg setOf
(funext fun x_1 =>
congrFun' (congrArg LE.le (congrArg f (AffineMap.lineMap_apply_zero xStar x)))
(f ((AffineMap.lineMap xStar x) x_1)))))
t)
ge_iff_le._simp_1))
hmin_thφlocal:IsLocalMinOn φ (Set.Icc 0 1) 0 := IsMinOn.localize hφminhfAt0:HasFDerivAt f (fderiv ℝ f xStar) ((AffineMap.lineMap xStar x) 0) :=
Eq.mpr (id (congrArg (HasFDerivAt f (fderiv ℝ f xStar)) (AffineMap.lineMap_apply_zero xStar x)))
(DifferentiableAt.hasFDerivAt (hf xStar))hlineDerivAt:HasDerivAt φ ((fderiv ℝ f xStar) (x - xStar)) 0 :=
Eq.mpr
(id
(HasDerivAt.congr_simp φ (f ∘ ⇑(AffineMap.lineMap xStar x)) (Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x)))
((fderiv ℝ f xStar) (x - xStar)) ((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar)
(map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0)))
(Eq.mp
(HasDerivAt.congr_simp (f ∘ ⇑(AffineMap.lineMap xStar x)) (f ∘ ⇑(AffineMap.lineMap xStar x))
(Eq.refl (f ∘ ⇑(AffineMap.lineMap xStar x))) ((fderiv ℝ f xStar) (x - xStar))
((fderiv ℝ f xStar) x - (fderiv ℝ f xStar) xStar) (map_sub (fderiv ℝ f xStar) x xStar) 0 0 (Eq.refl 0))
(HasFDerivAt.comp_hasDerivAt 0 hfAt0 AffineMap.hasDerivAt_lineMap))hlineDeriv:HasDerivWithinAt φ ((fderiv ℝ f xStar) (x - xStar)) (Set.Icc 0 1) 0 := HasDerivAt.hasDerivWithinAt hlineDerivAthdir1:1 ∈ posTangentConeAt (Set.Icc 0 1) 0 :=
have hseg01 :=
Convex.segment_subset (convex_Icc 0 1) (Set.left_mem_Icc.mpr zero_le_one) (Set.right_mem_Icc.mpr zero_le_one);
mem_posTangentConeAt_of_segment_subset
(have this :=
of_eq_true
(Eq.trans
(congrFun'
(congrArg Subset
(Eq.trans (congrArg (segment ℝ 0) (zero_add 1)) (segment_eq_Icc (of_eq_true zero_le_one._simp_1))))
(Set.Icc 0 1))
(subset_refl._simp_1 (Set.Icc 0 1)));
this)hnonneg:0 ≤ (ContinuousLinearMap.toSpanSingleton ℝ ((fderiv ℝ f xStar) (x - xStar))) 1 := IsLocalMinOn.hasFDerivWithinAt_nonneg hφlocal (HasDerivWithinAt.hasFDerivWithinAt hlineDeriv) hdir1⊢ 0 ≤ (fderiv ℝ f xStar) (x - xStar)
All goals completed! 🐙
Let \Omega \subseteq E be nonempty and convex, let
f : E \to \mathbb{R} be differentiable, and assume that f is convex on
\Omega. Then
\forall x,y \in \Omega,\qquad
f(y) \ge f(x) + \langle \nabla f(x), y-x \rangle.
Proof
Fix x,y \in \Omega. For every t \in (0,1], convexity of f on
\Omega gives
f\bigl(x+t(y-x)\bigr) \le (1-t)f(x) + t f(y).
Rearranging, we obtain
\frac{f\bigl(x+t(y-x)\bigr)-f(x)}{t} \le f(y)-f(x).
Because f is differentiable at x, letting t \downarrow 0 yields
\langle \nabla f(x), y-x \rangle \le f(y)-f(x).
Equivalently,
f(y) \ge f(x) + \langle \nabla f(x), y-x \rangle.
Since x,y \in \Omega were arbitrary, the claim follows.
Formal Statement and Proof
Lean theorem: Lecture01.lem_l1_gradient_lower_bound.
theorem proof_of_Lecture01_lem_l1_gradient_lower_bound {E : Type*}
[NormedAddCommGroup E] [NormedSpace ℝ E]
{Ω : Set E} {f : E → ℝ}
(hconv : ConvexOn ℝ Ω f) (hf : Differentiable ℝ f) :
∀ x ∈ Ω, ∀ y ∈ Ω, f y ≥ f x + fderiv ℝ f x (y - x) := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ f⊢ ∀ x ∈ Ω, ∀ y ∈ Ω, f y ≥ f x + (fderiv ℝ f x) (y - x)
intro x E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ω⊢ ∀ y ∈ Ω, f y ≥ f x + (fderiv ℝ f x) (y - x) E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:E⊢ y ∈ Ω → f y ≥ f x + (fderiv ℝ f x) (y - x) E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ω⊢ f y ≥ f x + (fderiv ℝ f x) (y - x)
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:y = x⊢ f y ≥ f x + (fderiv ℝ f x) (y - x)E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = x⊢ f y ≥ f x + (fderiv ℝ f x) (y - x)
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:y = x⊢ f y ≥ f x + (fderiv ℝ f x) (y - x) E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fy:Ehy:y ∈ Ωhx:y ∈ Ω⊢ f y ≥ f y + (fderiv ℝ f y) (y - y)
All goals completed! 🐙
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = xg:ℝ →ᵃ[ℝ] E := AffineMap.lineMap x y⊢ f y ≥ f x + (fderiv ℝ f x) (y - x)
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = xg:ℝ →ᵃ[ℝ] E := AffineMap.lineMap x yhmaps:Set.MapsTo (⇑g) (Set.Icc 0 1) Ω := Convex.mapsTo_lineMap hconv.left hx hy⊢ f y ≥ f x + (fderiv ℝ f x) (y - x)
have hconv_g : ConvexOn ℝ (Set.Icc (0 : ℝ) 1) (fun t : ℝ => f (g t)) := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = xg:ℝ →ᵃ[ℝ] E := AffineMap.lineMap x yhmaps:Set.MapsTo (⇑g) (Set.Icc 0 1) Ω := Convex.mapsTo_lineMap hconv.left hx hy⊢ ConvexOn ℝ (Set.Icc 0 1) fun t => f (g t)
All goals completed! 🐙
have hgderiv :
HasDerivAt (fun t : ℝ => f (g t)) ((fderiv ℝ f x) (y - x)) 0 := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = xg:ℝ →ᵃ[ℝ] E := AffineMap.lineMap x yhmaps:Set.MapsTo (⇑g) (Set.Icc 0 1) Ω := Convex.mapsTo_lineMap hconv.left hx hyhconv_g:ConvexOn ℝ (Set.Icc 0 1) fun t => f (g t) := ConvexOn.subset (ConvexOn.comp_affineMap g hconv) hmaps (convex_Icc 0 1)⊢ HasDerivAt (fun t => f (g t)) ((fderiv ℝ f x) (y - x)) 0
simpa using
(HasFDerivAt.comp_hasDerivAt_of_eq (x := (0 : ℝ))
(hf x).hasFDerivAt
(AffineMap.hasDerivAt_lineMap (a := x) (b := y) (x := (0 : ℝ)))
(E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = xg:ℝ →ᵃ[ℝ] E := AffineMap.lineMap x yhmaps:Set.MapsTo (⇑g) (Set.Icc 0 1) Ω := Convex.mapsTo_lineMap hconv.left hx hyhconv_g:ConvexOn ℝ (Set.Icc 0 1) fun t => f (g t) := ConvexOn.subset (ConvexOn.comp_affineMap g hconv) hmaps (convex_Icc 0 1)⊢ x = (AffineMap.lineMap x y) 0 All goals completed! 🐙))
have hslope :
(fderiv ℝ f x) (y - x) ≤ slope (fun t : ℝ => f (g t)) 0 1 := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = xg:ℝ →ᵃ[ℝ] E := AffineMap.lineMap x yhmaps:Set.MapsTo (⇑g) (Set.Icc 0 1) Ω := Convex.mapsTo_lineMap hconv.left hx hyhconv_g:ConvexOn ℝ (Set.Icc 0 1) fun t => f (g t) := ConvexOn.subset (ConvexOn.comp_affineMap g hconv) hmaps (convex_Icc 0 1)hgderiv:HasDerivAt (fun t => f (g t)) ((fderiv ℝ f x) (y - x)) 0 :=
Eq.mpr
(id
(HasDerivAt.congr_simp (fun t => f (g t)) (fun t => f (g t)) (Eq.refl fun t => f (g t)) ((fderiv ℝ f x) (y - x))
((fderiv ℝ f x) y - (fderiv ℝ f x) x) (map_sub (fderiv ℝ f x) y x) 0 0 (Eq.refl 0)))
(Eq.mp
(HasDerivAt.congr_simp (f ∘ ⇑(AffineMap.lineMap x y)) (f ∘ ⇑(AffineMap.lineMap x y))
(Eq.refl (f ∘ ⇑(AffineMap.lineMap x y))) ((fderiv ℝ f x) (y - x)) ((fderiv ℝ f x) y - (fderiv ℝ f x) x)
(map_sub (fderiv ℝ f x) y x) 0 0 (Eq.refl 0))
(HasFDerivAt.comp_hasDerivAt_of_eq 0 (DifferentiableAt.hasFDerivAt (hf x)) AffineMap.hasDerivAt_lineMap
(of_eq_true (Eq.trans (congrArg (Eq x) (AffineMap.lineMap_apply_zero x y)) (eq_self x)))))⊢ (fderiv ℝ f x) (y - x) ≤ slope (fun t => f (g t)) 0 1
exact hconv_g.le_slope_of_hasDerivAt (E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = xg:ℝ →ᵃ[ℝ] E := AffineMap.lineMap x yhmaps:Set.MapsTo (⇑g) (Set.Icc 0 1) Ω := Convex.mapsTo_lineMap hconv.left hx hyhconv_g:ConvexOn ℝ (Set.Icc 0 1) fun t => f (g t) := ConvexOn.subset (ConvexOn.comp_affineMap g hconv) hmaps (convex_Icc 0 1)hgderiv:HasDerivAt (fun t => f (g t)) ((fderiv ℝ f x) (y - x)) 0 :=
Eq.mpr
(id
(HasDerivAt.congr_simp (fun t => f (g t)) (fun t => f (g t)) (Eq.refl fun t => f (g t)) ((fderiv ℝ f x) (y - x))
((fderiv ℝ f x) y - (fderiv ℝ f x) x) (map_sub (fderiv ℝ f x) y x) 0 0 (Eq.refl 0)))
(Eq.mp
(HasDerivAt.congr_simp (f ∘ ⇑(AffineMap.lineMap x y)) (f ∘ ⇑(AffineMap.lineMap x y))
(Eq.refl (f ∘ ⇑(AffineMap.lineMap x y))) ((fderiv ℝ f x) (y - x)) ((fderiv ℝ f x) y - (fderiv ℝ f x) x)
(map_sub (fderiv ℝ f x) y x) 0 0 (Eq.refl 0))
(HasFDerivAt.comp_hasDerivAt_of_eq 0 (DifferentiableAt.hasFDerivAt (hf x)) AffineMap.hasDerivAt_lineMap
(of_eq_true (Eq.trans (congrArg (Eq x) (AffineMap.lineMap_apply_zero x y)) (eq_self x)))))⊢ 0 ∈ Set.Icc 0 1 All goals completed! 🐙) (E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = xg:ℝ →ᵃ[ℝ] E := AffineMap.lineMap x yhmaps:Set.MapsTo (⇑g) (Set.Icc 0 1) Ω := Convex.mapsTo_lineMap hconv.left hx hyhconv_g:ConvexOn ℝ (Set.Icc 0 1) fun t => f (g t) := ConvexOn.subset (ConvexOn.comp_affineMap g hconv) hmaps (convex_Icc 0 1)hgderiv:HasDerivAt (fun t => f (g t)) ((fderiv ℝ f x) (y - x)) 0 :=
Eq.mpr
(id
(HasDerivAt.congr_simp (fun t => f (g t)) (fun t => f (g t)) (Eq.refl fun t => f (g t)) ((fderiv ℝ f x) (y - x))
((fderiv ℝ f x) y - (fderiv ℝ f x) x) (map_sub (fderiv ℝ f x) y x) 0 0 (Eq.refl 0)))
(Eq.mp
(HasDerivAt.congr_simp (f ∘ ⇑(AffineMap.lineMap x y)) (f ∘ ⇑(AffineMap.lineMap x y))
(Eq.refl (f ∘ ⇑(AffineMap.lineMap x y))) ((fderiv ℝ f x) (y - x)) ((fderiv ℝ f x) y - (fderiv ℝ f x) x)
(map_sub (fderiv ℝ f x) y x) 0 0 (Eq.refl 0))
(HasFDerivAt.comp_hasDerivAt_of_eq 0 (DifferentiableAt.hasFDerivAt (hf x)) AffineMap.hasDerivAt_lineMap
(of_eq_true (Eq.trans (congrArg (Eq x) (AffineMap.lineMap_apply_zero x y)) (eq_self x)))))⊢ 1 ∈ Set.Icc 0 1 All goals completed! 🐙) zero_lt_one hgderiv
have hslope' : (fderiv ℝ f x) (y - x) ≤ f y - f x := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝhconv:ConvexOn ℝ Ω fhf:Differentiable ℝ fx:Ehx:x ∈ Ωy:Ehy:y ∈ Ωhxy:¬y = xg:ℝ →ᵃ[ℝ] E := AffineMap.lineMap x yhmaps:Set.MapsTo (⇑g) (Set.Icc 0 1) Ω := Convex.mapsTo_lineMap hconv.left hx hyhconv_g:ConvexOn ℝ (Set.Icc 0 1) fun t => f (g t) := ConvexOn.subset (ConvexOn.comp_affineMap g hconv) hmaps (convex_Icc 0 1)hgderiv:HasDerivAt (fun t => f (g t)) ((fderiv ℝ f x) (y - x)) 0 :=
Eq.mpr
(id
(HasDerivAt.congr_simp (fun t => f (g t)) (fun t => f (g t)) (Eq.refl fun t => f (g t)) ((fderiv ℝ f x) (y - x))
((fderiv ℝ f x) y - (fderiv ℝ f x) x) (map_sub (fderiv ℝ f x) y x) 0 0 (Eq.refl 0)))
(Eq.mp
(HasDerivAt.congr_simp (f ∘ ⇑(AffineMap.lineMap x y)) (f ∘ ⇑(AffineMap.lineMap x y))
(Eq.refl (f ∘ ⇑(AffineMap.lineMap x y))) ((fderiv ℝ f x) (y - x)) ((fderiv ℝ f x) y - (fderiv ℝ f x) x)
(map_sub (fderiv ℝ f x) y x) 0 0 (Eq.refl 0))
(HasFDerivAt.comp_hasDerivAt_of_eq 0 (DifferentiableAt.hasFDerivAt (hf x)) AffineMap.hasDerivAt_lineMap
(of_eq_true (Eq.trans (congrArg (Eq x) (AffineMap.lineMap_apply_zero x y)) (eq_self x)))))hslope:(fderiv ℝ f x) (y - x) ≤ slope (fun t => f (g t)) 0 1 :=
ConvexOn.le_slope_of_hasDerivAt hconv_g
(of_eq_true
(Eq.trans Set.mem_Icc._simp_1
(Eq.trans (congr (congrArg And (le_refl._simp_1 0)) zero_le_one._simp_1) (and_self True))))
(of_eq_true
(Eq.trans Set.mem_Icc._simp_1
(Eq.trans (congr (congrArg And zero_le_one._simp_1) (le_refl._simp_1 1)) (and_self True))))
zero_lt_one hgderiv⊢ (fderiv ℝ f x) (y - x) ≤ f y - f x
All goals completed! 🐙
All goals completed! 🐙
Let \Omega \subseteq E be nonempty and convex, let
f : E \to \mathbb{R} be differentiable, assume that f is convex on
\Omega, and let x^\star \in \Omega. Then the following are equivalent:
-
f(x^\star) \le f(x)for everyx \in \Omega; -
\langle \nabla f(x^\star), x - x^\star \rangle \ge 0for everyx \in \Omega.
Proof
We first prove that item (1) implies item (2). If
f(x^\star) \le f(x) for every x \in \Omega, then x^\star is a global
minimizer of f on \Omega. Applying the first-order necessary condition
yields
\forall x \in \Omega,\qquad
\langle \nabla f(x^\star), x-x^\star \rangle \ge 0.
We next prove that item (2) implies item (1). Fix any x \in \Omega.
Applying the global linear lower bound with x^\star and x, we obtain
f(x) \ge f(x^\star) + \langle \nabla f(x^\star), x-x^\star \rangle.
By item (2),
\langle \nabla f(x^\star), x-x^\star \rangle \ge 0.
Hence
f(x) \ge f(x^\star).
Because x \in \Omega was arbitrary, item (1) follows.
Formal Statement and Proof
Lean theorem: Lecture01.thm_l1_first_order_char.
theorem proof_of_Lecture01_thm_l1_first_order_char {E : Type*}
[NormedAddCommGroup E] [NormedSpace ℝ E]
{Ω : Set E} {f : E → ℝ} {xStar : E}
(hconv : ConvexOn ℝ Ω f) (hxStar : xStar ∈ Ω)
(hf : Differentiable ℝ f) :
(∀ x ∈ Ω, f xStar ≤ f x) ↔
(∀ x ∈ Ω, 0 ≤ fderiv ℝ f xStar (x - xStar)) := E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:Ehconv:ConvexOn ℝ Ω fhxStar:xStar ∈ Ωhf:Differentiable ℝ f⊢ (∀ x ∈ Ω, f xStar ≤ f x) ↔ ∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar)
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:Ehconv:ConvexOn ℝ Ω fhxStar:xStar ∈ Ωhf:Differentiable ℝ f⊢ (∀ x ∈ Ω, f xStar ≤ f x) → ∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar)E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:Ehconv:ConvexOn ℝ Ω fhxStar:xStar ∈ Ωhf:Differentiable ℝ f⊢ (∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar)) → ∀ x ∈ Ω, f xStar ≤ f x
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:Ehconv:ConvexOn ℝ Ω fhxStar:xStar ∈ Ωhf:Differentiable ℝ f⊢ (∀ x ∈ Ω, f xStar ≤ f x) → ∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar) E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:Ehconv:ConvexOn ℝ Ω fhxStar:xStar ∈ Ωhf:Differentiable ℝ fhmin:∀ x ∈ Ω, f xStar ≤ f x⊢ ∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar)
All goals completed! 🐙
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:Ehconv:ConvexOn ℝ Ω fhxStar:xStar ∈ Ωhf:Differentiable ℝ f⊢ (∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar)) → ∀ x ∈ Ω, f xStar ≤ f x intro hgrad E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:Ehconv:ConvexOn ℝ Ω fhxStar:xStar ∈ Ωhf:Differentiable ℝ fhgrad:∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar)x:E⊢ x ∈ Ω → f xStar ≤ f x E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:Ehconv:ConvexOn ℝ Ω fhxStar:xStar ∈ Ωhf:Differentiable ℝ fhgrad:∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar)x:Ehx:x ∈ Ω⊢ f xStar ≤ f x
E:Type u_1inst✝¹:NormedAddCommGroup Einst✝:NormedSpace ℝ EΩ:Set Ef:E → ℝxStar:Ehconv:ConvexOn ℝ Ω fhxStar:xStar ∈ Ωhf:Differentiable ℝ fhgrad:∀ x ∈ Ω, 0 ≤ (fderiv ℝ f xStar) (x - xStar)x:Ehx:x ∈ Ωhlower:f x ≥ f xStar + (fderiv ℝ f xStar) (x - xStar) := lem_l1_gradient_lower_bound hconv hf xStar hxStar x hx⊢ f xStar ≤ f x
All goals completed! 🐙
1.3. Examples of Convex Functions
In the Euclidean model E = \mathbb{R}^d, given data
(a_i,b_i)_{i=1}^m with a_i \in \mathbb{R}^d, consider
\min_{x \in \mathbb{R}^d}\ \frac{1}{2m}\sum_{i=1}^m (a_i^\top x-b_i)^2.
This problem is explicit, convex, and in favorable full-rank settings has a closed-form normal-equation solution. But in large-scale settings one still uses iterative algorithms. Even a problem with a recognizable formula is therefore not automatically computationally trivial.
Given labels y_i \in \{\pm1\}, consider
\min_{x \in \mathbb{R}^d}\ \frac{1}{m}\sum_{i=1}^m
\log\!\bigl(1+e^{-y_i a_i^\top x}\bigr)+\frac{\lambda}{2}\|x\|_2^2.
This problem is again explicit and convex, but usually has no closed-form minimizer. Its importance is computational rather than symbolic: value and gradient are both natural to evaluate. Convexity here does not mean closed form; it means local information can become globally meaningful.
In the Euclidean model E = \mathbb{R}^n, let Q \succeq 0,
b \in \mathbb{R}^n, and let
\Omega := \{x \in \mathbb{R}^n : Cx=d,\ x \ge 0\}.
Then
\min\left\{\frac12 x^\top Qx+b^\top x : x\in\Omega\right\}.
This is a convex optimization problem with both objective geometry and explicit constraints. It previews later themes all at once: constrained optimality, dual variables, and KKT conditions.
1.4. Dependency and Proof Sketch
-
The local-to-global theorem uses only the convexity inequality on the segment joining
x^\starto an arbitraryx \in \Omega. Ifx^\starwere only locally optimal and some distantxwere strictly better, then the convex combination nearx^\starwould already contradict local optimality. -
The first-order necessary condition is proved by differentiating the function
\phi(t) := f(x^\star+t(x-x^\star))at
t = 0^+, using global minimality ofx^\starover a convex feasible set. -
The global linear lower bound is proved by applying convexity to
f\bigl(x+t(y-x)\bigr)\le (1-t)f(x)+t f(y)for
t \in (0,1], rearranging, and lettingt \downarrow 0. -
The first-order characterization is the combination of the first-order necessary condition and the global linear lower bound,
f(y)\ge f(x)+\langle \nabla f(x), y-x\rangle.
1.5. Exercises
-
Give an example of a nonconvex differentiable function on
\mathbb{R}^2that has a strict local minimizer that is not global. Then explain precisely where the local-to-global theorem fails. -
Prove that if
C \subseteq \mathbb{R}^nis convex andf : C \to \mathbb{R}is strictly convex, thenfhas at most one global minimizer onC. -
Let
f(x)=\max\{x_1,x_2,0\}on\mathbb{R}^2. Determine all global minimizers and explain why the differentiable convex first-order characterization does not apply. -
A set
C \subseteq \mathbb{R}^nis midpoint-convex if\forall x,y \in C,\qquad \frac{x+y}{2}\in C.Prove that every convex set is midpoint-convex. Then prove that every closed midpoint-convex set is convex. Finally, give a counterexample showing that midpoint-convexity alone does not imply convexity.