UCB MATH 170 Notes
These are notes from MATH 170, at UC Berkeley. The course is taught by Franzisca Weber. Note that I missed around 2 classes which probably threw off the lecture numbers.
Table of Contents
- Lecture 1
- Lecture 2
- Lecture 3
- Lecture 4
- Lecture 5
- Lecture 6
- Lecture 7: Interpretations of Constrainted Optimization
- Lecture 8
- Lecture 9
- Lecture 10
- Lecture 11
- Lecture 12
- Lecture 13
Lecture 1
Lecture 2
We continue the example from last time.
Let $x_{ij}$ be the barrels shipped from site $i$ to market $j$ and let $c_ij$ be the cost of shipping 1 barrel from site $i$ to market $j$. Then the total cost is $$\sum_{i,j} c_{ij}x_{ij}$$ We have some constraints. For example, we must have for all $i$, $$\sum_{j} x_{ij} \le s_i$$ Another constraint is that we want $$\sum_i x_{ij} \ge d_j$$ for each $j$. Note that we could actually use equality here because we don't ever want to ship more than we need to. And of course, we must have $$x_{ij} \ge 0$$ for all $i, j$. For this example we won't draw, but this is a linear programming problem which we will learn how to solve in this course.
Lecture 3
We start with definitions. Euclidean vector spaces, inner product spaces, and norms were defined but I omitted those definitions here.
We define the balls in $\mathbb{R}^n$: $$\overline{B}_r (x) = \overline{B}(x,r) = \{y \in \mathbb{R}^n \;|\; |x - y| \le r\}$$ $$B_r(x) = B(x,r) = \{y \in \mathbb{R}^n \;|\; |x - y| < r\}$$ $$\partial B_r(x) = \partial B(x,r) = \{y \in \mathbb{R}^n \;|\; |x - y| = r \}$$
We define matrices, transpose, and symmetric matrices.
Definition (Nonnegative Definite). If $A \in \mathbb{R}^{nm}$ is symmetric and satisfies for any $x \in \mathbb{R}^n$ $x \cdot (Ax) \ge 0$, ($x \cdot (Ax) = \sum a_{ij} x_i x_j = x^T A x$), we say $A$ is non-negative definite. We write $A \ge 0$.
Definition (Positive Definite). If $x^T A x > 0$, we say $A$ is positive definite, and write $A > 0$.
Definition (Transpose). $A \in \mathbb{R}^{n\times m}$, $x \in \mathbb{R}^m$, $y \in \mathbb{R}^n$, $$y^TAx = x^T A^T y$$
We next defined partial derivative and gradient. The second order partials matrix is important enough to have a name, it is called the Hessian matrix: $$\nabla^2 f(x) = \begin{bmatrix} \dfrac{\partial^2 f}{(\partial x_1)^2} & ... & \dfrac{\partial^2 f}{\partial x_n \partial x_1} \\ ... & & ... \\ \dfrac{\partial^2 f}{\partial x_1 \partial x_n} & ... & \dfrac{\partial^2 f}{(\partial x_n)^2}\end{bmatrix}$$
We define function composition and chain rule. We then did some real analysis definitions.
Lecture 4
Variations
We start with unconstrained optimization. Let $f : \mathbb{R}^n \rightarrow \mathbb{R}$.
Definition (Minimizer). $x_0 \in \mathbb{R}^n$ is a minimizer if for all $x \in \mathbb{R}^n$, $f(x_0) \le f(x)$. We write $f(x_0) = \min_{x\in\mathbb{R}^n} f(x)$.
Definition (Maximizer). $x_\wedge$ is a maximizer if $\forall x \in \mathbb{R}^n$, $f(x_\wedge) \ge f(x)$. We write $f(x_n) = \max_{x\in\mathbb{R}^n} f(x)$.
Next we find properties of minimizers.
We start with the one dimensional case (where $n = 1$). $$\lim_{h \rightarrow^+ 0} \dfrac{f(x+h) - f(x)}{h} = f'(x_0) \ge 0$$ $$\lim_{h\rightarrow^-} \dfrac{f(x_0 + h) - f(x)}{h} = f'(x_0) \le 0$$ $$\Rightarrow f'(x_0) = 0$$ We now consider general $n$. Let $y \in \mathbb{R}^n$ and define $\phi(t) = f(x_0 + ty)$. $0$ is a minimizer of $\phi : \mathbb{R} \rightarrow \mathbb{R}$. $$0 = \phi'(0) = \phi'(t)|_{t = 0} = \nabla f(x_0 + ty)y |_{t = 0} = \nabla f(x_0) y = 0$$ $y$ is arbitrary so take $y = \nabla f(x_0)$ so $|\nabla f(x_0)|^2 = 0$, so $\nabla f(x_0) = 0$.
$\blacksquare$
Definition (Critical Point). A point $x_0 \in \mathbb{R}^n$ that staisfies $\nabla f(x_0) =0$ is called a critical point for $f$ (also known as an extremal point).
We start with the case wehn $n = 1$. $$f(x_0 + h) = f(x_0) + hf'(x_0) + \dfrac{h^2}{2} f''(z)$$ where $z \in [x_0, x_0 + h]$ with $h > 0$ and $\in [x_0 + h, x_0]$ with $h < 0$. We have that $hf'(x_0) = 0$, so we have: $$f''(z) = \dfrac{f(x_0 + h) - f(x_0)}{h^2/2} \ge 0$$ Sending $h \rightarrow 0$, we have $$f''(x_0) = \lim_{h \rightarrow 0} f''(z) \ge 0$$
We now consider general $n$. Define $\phi(t) = f(x_0 + ty)$ for $y \in \mathbb{R}^n$. We have $\phi : \mathbb{R} \rightarrow \mathbb{R}$, and 0 is a minimizer of $\phi$. By the $n =1$ case, $\phi''(0) \ge 0$. Then $$0 \le \phi''(0) = \phi''(t) |_{t = 0} = \dfrac{d}{dt} (\nabla f(x_0 + ty) \cdot y)|_{t = 0} = y (\nabla^2 f(x_0 + ty) \cdot y) |_{t = 0}$$ $$= y^T \nabla^2 f(x_0) y = \sum y^i y^j \partial_{x_i} \partial_{x_j} f(x_0)$$ and thus we have $\nabla^2 f(x_0) \ge 0$.
$\blacksquare$
Lecture 5
Today we discuss some applications and examples from Evans' lecture notes.
Then we wish to minimize: $$f(t) = \dfrac{d_1}{v_1} + \dfrac{d_2}{v_2}$$ to get path of the light ray. Suppose the light hits the boundary at the point $c$ with coordinates $(x,0)$, and $a$ and $b$ have coordinates $(x_1,y_1)$ and $(x_2,y_2)$ respectively. We have: $$f(x) = \dfrac{\sqrt{(x_1 - x)^2 + y_1^2}}{v_1} + \dfrac{\sqrt{(x_2 - x)^2 + y_2^2}}{v_2}$$ so $$f'(x) = \dfrac{-x_1 + x}{v_1 \sqrt{(x_1 - x)^2 + y_1^2}} + \dfrac{-x_2 + x}{v_2 \sqrt{(x_2 - x)^2 + y_2^2}}$$ since $x_2 > x$, $$f'(x) = \dfrac{\sin \alpha_1}{v_1} - \dfrac{\sin \alpha_2}{v_2} = 0$$ $$\dfrac{\sin \alpha_1}{v_1} = \dfrac{\sin \alpha_2}{v_2}$$ which is Snell's law. Now suppose light reflects from point $a = (x_1, y_1)$ to point $b = (x_2, y_2)$, and it reflects off the mirror at $c = (x,0)$. We try to derive $\alpha_1 = \alpha_2$ as the minimum for the length of time on the path: $$f(x) = \dfrac{\sqrt{(x_1 - x)^2 + y_1^2}}{v_1} + \dfrac{\sqrt{(x_2 - x)^2 + y_2^2}}{v_1}$$ $$f'(x) = \dfrac{1}{v_1} \left[ \dfrac{x - x_1}{v_1 \sqrt{(x_1 - x)^2 + y_1^2}} + \dfrac{x - x_2}{\sqrt{(x_2 -x)^2 + y_2^2}}\right]$$ $$\dfrac{1}{v_1} (\sin \alpha_1 - \sin \alpha_2 ) = 0$$ $$\sin \alpha_1 = \sin \alpha_2$$ and since $\alpha_1,\alpha_2 \le 90$, $\alpha_1 = \alpha_2$.
Lecture 6
By Taylor's theorem, $h \in \mathbb{R}^d$, we have: $$f(x_0 + h) = f(x_0) + h\nabla f(x_0) + \dfrac{1}{2} h^T \nabla^2 f(x_0) h + R_{x_0,2}(h)$$ such that $\lim_{h\rightarrow 0} \dfrac{1}{|h|^2} R_{x_0,2}(h) = 0$. Since $\nabla^2 f(x_0) \succ 0$, all of its eigenvalues are real and positive. In other words, there exists $\lambda > 0$ such that $h^T \nabla^2 f(x_0) h \ge \lambda |h|^2$.
Since $\dfrac{1}{|h|^2} R_{x_0,2}(h) \rightarrow 0$, we can find $\epsilon > 0$ such that for all $|h| < \epsilon$ we have $\dfrac{1}{|h|^2} |R_{x_0, 2}(h)| < \dfrac{\lambda}{4}$. Then $$f(x_0 + h) \ge f(x_0) + \dfrac{\lambda}{2} |h|^2- |h|^2 \dfrac{\lambda}{4} = f(x_0) + \dfrac{\lambda}{4} |h|^2$$ which is true for all $|h| < \epsilon$, and this is exactly what we had wished to show.
$\blacksquare$
Consider $\{(x^k, y^k) \;|\; x^k \in \mathbb{R}^n, y^k \in \mathbb{R}$, $k = 1,...,N\}$ (sequences). Minimize $\sum_{k=1}^N |x^k m + b - y^k|^2 = f(m,b)$. Define $e^k = x^k m + b - y^k$. Also, $\overline{x} = \dfrac{1}{N} \sum_{k=1}^N x^k$, $C_{ij} = \dfrac{1}{N} \sum_{k=1}^N (x_i^k - \overline{x}_i)(x_j^k - \overline{x}_j)$.
$$\dfrac{\partial f}{\partial m_i}(b,m) = \sum_{k=1}^N x_i^k \cdot (x^k m + b - y^k) = 0$$ $$= \sum_{k=1}^N x_i^k x^k m + bN\overline{x}_i - \sum_{k=1}^N x_i^k y^k$$ $$= \sum_{k=1}^N x_i^k x^k m - N\overline{x}_i m \cdot \overline{x} + N\overline{x}_i \overline{y} - \sum_{k=1}^N x_i^k y^k = 0$$ $$= \sum_{k=1}^N m (c^k - \overline{x}) x_i^k + \sum x_i^k (\overline{y} - y^k) = 0$$ We also have that: $$\overline{y} - \dfrac{1}{N} \sum y^k$$ $$=b+m\overline{x} - (b + \dfrac{m}{N}\sum_{k=1}^N x^k)$$ $$= m\overline{x} - \dfrac{m}{N}\sum_{k=1}^N x^k$$ so $$\overline{x}_i (\overline{y} - \dfrac{1}{N}\sum_{k=1}^N y^k) = \overline{x}_i m (\overline{x} - \dfrac{1}{N}\sum_{k=1}^N x^k) = 0$$ $$\sum_{k=1}^N \overline{x}_i (\overline{y} - y^k) = \sum_{k=1}^N \overline{x}_i m (\overline{x} - x^k) = 0$$
$$\sum_{k=1}^N m(x^k - \overline{x}) (x_i^k - \overline{x}_i) + \sum_{k=1}^N (\overline{y} - y^k) (x_i^k - \overline{x}_i) = 0$$ $$\sum_{k=1}^N Nc_{ij} m_j = N(Cm)_j \Leftrightarrow Cm = d$$
$\blacksquare$
We now consider Minimization with equality constraints.
Minimization with Equality Constraints
Suppose we have $f : \mathbb{R}^n \rightarrow \mathbb{R}$, $g = (g_1,...,g_m)^T : \mathbb{R}^n \rightarrow \mathbb{R}^m$. Suppose $f$ and $g$ are twice continuously differentiable. We would like to minimize $f$ with the constraint that $g = 0$. The equations $g_k(x) = 0$ are called equality constraints.
Definition. $x \in \mathbb{R}^n$ such that $g_k(x) = 0$ for all $k = 1,...,m$ is called feasible, $g_k(x) = g_k(x_1,...,x_n)$, $$\nabla g = \begin{pmatrix} \dfrac{\partial g_1}{\partial x_n} & \dfrac{\partial g_1}{\partial x_2} & ... & \dfrac{\partial g_1}{\partial x_n} \\ \dfrac{\partial g_2}{\partial x_n} & \dfrac{\partial g_2}{\partial x_2} & ... & \dfrac{\partial g_2}{\partial x_n} \\ ... & & & \\ \dfrac{\partial g_m}{\partial x_n} & \dfrac{\partial g_m}{\partial x_2} & ... & \dfrac{\partial g_m}{\partial x_n}\end{pmatrix} \in \mathbb{R}^{m \times n}$$
Lecture 7: Interpretations of Constrainted Optimization
Consider $f, g_1,..., g_m : \mathbb{R}^n \rightarrow \mathbb{R}$, $f,g = (g_1,...,g_m)^T$, twice continuously differentiable. Last time we were trying to minimize $f$ subject to the constraint that $g = 0$.
We will prove this later, but for now we will get some experience working with this. If $\gamma_0$ is not zero, we can divide by $\gamma_0$ to get new $\lambda_0^1,...,\lambda_0^m$ and $\gamma_0 = 1$. Note that we can also write the equation in the above theorem as $$\gamma_0 \nabla f (x_0)+ \lambda_0^T \nabla g(x_0) = 0$$
We start with the case when $m = 1$, with a geometric interpretation. Consider a car driving on a hilly road, so that we can draw some elevation curves mapping the area, and a road going on some path along this map. The road is the constraint, the map is the $f$, and our theorem helps us find the minimum elevation of the car along this road.
We can make some observations. For one, by calculus, we know that the gradient is perpendicular to the level sets. At $x_0$ (the minimum), we know that the road is tangent to the elevation curves/level sets of $f$, so the gradients of $f$ and $g$ must be parallel (as they are perpendicular to the level sets). All of this means that $$\nabla f(x_0) = -\lambda_0 \nabla g(x_0) \Leftrightarrow \nabla f(x_0) + \lambda_0 \nabla g(x_0) = 0$$ which is exactly what our theorem said, and $\lambda_0$ is called the Lagrange multiplier.
We now take a second interpretation, with few variables. Again consider $m = 1$. We reduce the constraint problem above to an unconstrained problem on fewer variables. Assume $g(x) = 0$ can be written as $x_n = \phi(x')$ where $x' = (x_1,...,x_{n-1})^T$. Suppose $x_0' = (x_{0,1},...,x_{n-1,1})^T$ where $x_0$ solves the constraint problem. Now $x_0'$ is an unconstrained minimizer of $f(x',\phi(x')) = \tilde{f}(x')$.
We have $$\dfrac{\partial}{\partial x_i} (f(x', \phi(x'))) = \dfrac{\partial f}{\partial x_i} (x', \phi(x')) + \dfrac{\partial f}{\partial x_n} (x_0', \phi(x_0')) \dfrac{\partial \phi}{\partial x_i} = 0$$ for $i = 1,..., n$. Then $$g(x_0', \phi(x_0')) = 0$$ for feasible $x_0$ (or $x_0'$) $$\dfrac{\partial}{\partial x_i} (g(x_0', \phi(x_0'))) = \dfrac{\partial g}{\partial x_i} (x_0', \phi(x_0')) + \dfrac{\partial g}{\partial x_n} (x_0', \phi(x_0')) \dfrac{\partial \phi}{\partial x_i} = 0$$ for $i = 1,..., n$. Now $$\lambda_0 = -\dfrac{\partial f}{\partial x_n} (x_0) \left( \dfrac{\partial g}{\partial x_n}(x_0)\right)^{-1}$$ $$\dfrac{\partial f}{\partial x_n}(x_0) = -\lambda_0 \dfrac{\partial g}{\partial x_n} (x_0)$$ $$\dfrac{\partial f}{\partial x_i}(x_0) = -\lambda_0 \dfrac{\partial g}{\partial x_n} (x_0) \dfrac{\partial \phi}{\partial x_i} = 0$$ Then we have $$\dfrac{\partial g}{\partial x_n} (x_0) \dfrac{\partial \phi}{\partial x_i} = -\dfrac{\partial g}{\partial x_i}(x_0)$$ Plugging in gives $$\dfrac{\partial f}{\partial x_i} (x_0) + \lambda_0 \dfrac{\partial g}{\partial x_i} (x_0) = 0$$ for $i = 1,..., n-1$ as desired.
Lecture 8
We consider the value function interpretation with $m = 1$. For $a \in \mathbb{R}$, we define $v(a) = \min_{y\in \mathbb{R}^n}\{ f(y) \;|\; g(y) = a\}$, with $v(g(x)) \le f(x)$, $v(g(x_0)) = v(0) = f(x_0)$, and $h(x) = f(x) - v(g(x)) \ge 0$.
$h$ now has a minimizer at $x = x_0$, and min becomes the unconstrained minimization of $h$. Thus $x_0$ is a critical point of $h$, so $$\nabla h(x_0) = \nabla f(x_0) - v'(g(x_0)) \nabla g(x_0) = 0$$ $$$\nabla h (x_0) = \nabla f(x_0) - v'(g(x_0)) \nabla g(x_0) = 0$$ $$= \nabla f(x_0) - v'(0) \nabla g(x_0) = 0$$ Now we simply define $\lambda = -v'(0)$ to get the Lagrange multiplier.
I left this for me to read in the (or a) textbook because of my midterm today.
Lecture 9
We recall the Lagrange Multiplier theorem from previous classes. Essentially the idea behind the proof was that we get: $$0 = \gamma_{\alpha_j} \nabla f (x_{\alpha_j}) + \nabla g(x_{\alpha_j})^\top \lambda_{\alpha_j} + \gamma_{\alpha_j} \beta (x_{\alpha_j} - x_0)$$ for some sequence $\alpha_j$, and then $\gamma_{\alpha_j} \rightarrow \gamma_0$, $\nabla f(x_{\alpha_j}) \rightarrow \nabla f(x_0)$, $\nabla g(x_{\alpha_j})^\top \rightarrow \nabla g(x_0)^\top$, $\lambda_{\alpha_j} \rightarrow \lambda_0$, and of course $x_{\alpha_j} - x_0$ goes to 0.
Definition (Regular Point). $x_0$ is regular if $\nabla g_k(x_0)$ are linearly independent in $\mathbb{R}^n$ ($k = 1,...,m$).
The previous theorem shows that the equation holds, so we need to show that $\gamma_0 \neq 0$. We assume for contradiction that $\gamma_0 = 0$. Then $$\sum_{k=1}^m \lambda_0^k \nabla g_k(x_0) = 0$$ but then they are dependent (since we assumed in the proof othe theorem that one of the $\gamma, \lambda$ must be nonzero), so they are not independent. Contradiction.
$\blacksquare$
We now consider the second variation.
Consider $y \in \mathbb{R}^m$, and assume $\nabla g(x_0) \nabla g(x_0)^\top y = 0$. We must show that $y = 0$. To do so, we take the inner product with $y$: $$0 = y^\top \nabla g(x_0) \nabla g(x_0)^\top = (\nabla g(x_0)^\top y)^\top \nabla g(x_0)^\top y = \left| \nabla g(x_0)^\top y\right|^2$$ Now we have $\left|\nabla g(x_0)^\top y\right| = 0$ so $\nabla g(x_0)^\top y = 0$, and note that $$\nabla g(x_0)^\top = \begin{pmatrix} \dfrac{\partial g_1}{\partial x_1}(x_0) & \dfrac{\partial g_2}{\partial x_2}(x_0) & ... & \dfrac{\partial g_m}{\partial x_1} (x_0) \\ ... & & & ... \\ \dfrac{\partial g_1}{\partial x_n} (x_0) & ... & ... & \dfrac{\partial g_m}{\partial x_n}(x_0) \end{pmatrix}$$ $$\nabla g(x_0)^\top y = \left( \sum_{k=1}^m \dfrac{\partial g_k}{\partial x_j} (x_0)y_k\right)^n_{j=1}$$ $$= \sum_{k=1}^m \nabla g_k(x_0) y_k$$ Because $\nabla g_k$ are linearly independent, $y_k = 0$ for $k = 1,...,m$ so $y = 0$.
$\blacksquare$
Lecture 10
We return to the proof of the Lagrange Multiplier theorem. $$F^{\alpha} (x) = f(x) + \dfrac{\alpha}{2} |g(x)|^2 + \dfrac{\beta}{2}|x-x_0|^2$$ with $x_\alpha \rightarrow x_0$ as $\alpha \rightarrow \infty$. Also, $$\gamma_\alpha = (1 + \alpha^2|g(x_\alpha)|^2)^{-1/2} \rightarrow \gamma_0 \neq 0$$ so $\{\alpha|g(x_\alpha)|\}_{\alpha > 0}$ is a bounded sequence. By the Bolzano-Weierstrass Theorem, $\{\alpha g(x_\alpha)\}_{\alpha > 0}$ is a convergent subsequence. So $\alpha_ig(x_{\alpha_i}) \rightarrow \lambda_0$ as $j \rightarrow \infty$ $\{\alpha_j\}_{j = n}^\infty$ subsequence of $\{\alpha\}$.
Because $x_\alpha$ for $\alpha$ large enough lies in the integerior of $B = \overline{B}(x_0,1)$, ($x_\alpha$ minimizes $F^{\alpha}$ in $B$) $$\nabla^2 F^{\alpha} (x_\alpha) \succeq 0$$ $$\nabla F^{\alpha}(x_\alpha) = \nabla f(x_\alpha) + \alpha g(x-\alpha)^\top \nabla g(x_\alpha) + \beta(x_\alpha - x_0)$$ $$\nabla^2 F^{\alpha} (x_\alpha) = \nabla^2f(x_\alpha) + \alpha\nabla g(x_\alpha)^\top \nabla g(x_\alpha) + \alpha \sum_{k=1}^m g_k(x_\alpha) \nabla^2 g_k(x_\alpha) + \beta I \succeq 0 \;\;\;(*)$$ From the previous Lemma, we know that $\nabla g(x_0) \nabla g(x_0)^\top$ is invertible. Therefore, since $\nabla g$ is continuous, $\nabla g(x_\alpha)\nabla g(x_\alpha)^\top$ is invertible for $\alpha$ large enough.
So for $y$ such that $\nabla g(x_0)y = 0$, we define $$z_\alpha = y - \nabla g(x_\alpha)^\top (\nabla g(x_\alpha)\nabla g(x_\alpha)^\top)^{-1}\nabla g(x_\alpha)y$$ Now $$\nabla g(x_\alpha)z_\alpha = \nabla g(x_{\alpha})y - (\nabla g(x_{\alpha}) \nabla g(x_\alpha)^\top)(\nabla g(x_\alpha)\nabla g(x_{\alpha})^\top)^{-1}\nabla g(x_\alpha)y$$ $$= \nabla g(x_\alpha)y - \nabla g(x_\alpha)y = 0$$ so in the limit $\nabla g(x_0)z = 0$ where $z_\alpha \rightarrow z$. By definition and continuity of $\nabla g $ and $x_\alpha \rightarrow x_0$, we get $z_\alpha \rightarrow y$ as $\alpha \rightarrow \infty$. We use $z_\alpha$ in $(*)$ to get $$0 \le z_\alpha^\top \nabla^2 F^\alpha (x_\alpha) z_\alpha = z_\alpha^\top(\nabla^2f(x_\alpha)+\alpha\nabla g(x_\alpha)^\top\nabla g(x_\alpha) + \alpha \sum_{k=1}^m g_k(x_\alpha) \nabla^2 g_k(x_\alpha) + \beta I)z_\alpha$$ $$\rightarrow y^\top (\nabla^2f(x_0) + \lambda_0^k\nabla^2g_k(x_0) + \beta I) y$$ $$0 \le y^\top (\nabla^2 f(x_0) + \sum_{k=1}^m \lambda_0^k\nabla^2g_k(x_0) + \beta I)y$$ and now as $\beta \rightarrow 0$, we have the desired result.
$\blacksquare$
Lecture 11
We continue the example from last time.
$\blacksquare$
And thus $$\nabla f(i) + \sum_{k=0}^{N+1} \lambda_0^k \nabla g_k(i) = 0$$ $$\dfrac{\partial f}{\partial i_{kl}} = 2r_{kl} i_{kl}$$ The first statement above gives $$\dfrac{\partial f}{\partial i_{kl}} + \sum_{m=0}^{N+1} \lambda_0^m \dfrac{\partial g_m}{\partial i_{kl}} (i) = 0$$ $$2r_{kl} i_{kl} + \lambda_0^k - \lambda_0^l = 0$$ We can swap the signs, and we have: $$-\lambda_0^k + \lambda_0^l = 2r_{kl} i_{kl}$$ $$-v^k + v^l = 2r_{kl} i_{kl}$$ which is Ohm's law up to constants as desired.
Lecture 12
Now we consider minimization of linear functions subject to linear constraints. Here we have- $x = (x_1,...,x_n)^\top \in \mathbb{R}^n$, we write $x \ge 0$ if $x_i \ge 0$ for all $i$
- $x \in \mathbb{R}^n$, we write $x > 0$ if $x_i > 0$ for all $i$
- For $x,y \in \mathbb{R}^n$, we write $x \ge y$ if $x_i \ge y_i$ for all $i$.
Definition (Canonical Primal Linear Programming Problem). Let $A \in \mathbb{R}^{m \times n}$. Let $b \in \mathbb{R}^m$, and $c \in \mathbb{R}^n$. Then the canoncial primal linear programming problem is to find $x_0 \in \mathbb{R}^n$ such that $x_0$ minimizes $x \cdot c$ subject to $Ax = b$ and $x \ge 0$.
Definition (Feasible). If $x \in \mathbb{R}^n$ we say $x$ is feasible.
Definition (Dual). The canonical dual problem is to find a solution $y_0 \in \mathbb{R}^n$ to the problem "maximize $y \cdot b$ subject to constraint $A^T y \le c$."
Definition (Feasible). $y$ is feasible if $A^T y \le c$.
Why dual and primal? The next theorem:
- If $x$ is feasible for the primal problem and $y$ is feasible for the dual problem, then $$b\cdot y \le c \cdot x$$
- If $x_0$ is feasible for the primal problem and $y_0$ is feasible for the dual problem and $b \cdot y_0 = c \cdot x_0$, then $x_0$ solves the primal problem and $y_0$ solves the dual problem.
For the first part, let $x$ and $y$ be feasible. Then $$b \cdot y = (Ax) \cdot y = x \cdot (A^T y) = \sum_{i = 1}^n x_i (A^Ty)_i \le \sum_{i=1}^n x_i c_i = x \cdot c$$ as desired.
Now we consider the second part. Suppose $x_0$ and $y_0$ are feasible and $y_0 \cdot b = x_0 \cdot c$. Then from (i), for any $y$ that is feasible, $b \cdot y \le c \cdot x_0 = y_0 \cdot b$, so $y_0$ maximizes $b \cdot y$ over all feasible $y$. Similarly, from (i), for all $x$ that are feasible, $x \cdot c \ge b \cdot y_0 = x_0 \cdot c$, so $x_0$ minimizes among all feasible $x$ as desired.
$\blacksquare$
Definition. The standard linear programming problem is to find $x_0 \in \mathbb{R}^n$ solving "minimize $xc$ subject to $Ax \ge b$ and $x \ge 0$".
Definition. The standard dual problem is to find $y_0 \in \mathbb{R}^m$ solving "maximize $by$ subject to $A^\top y \le c$ and $y \ge 0$."
Lecture 13
The general linear programming problem: Find $x_0 \in \mathbb{R}^n$ solving "minimize $c \cdot x$ subject to $\sum_{j=0}^n a_{ij} x_j \ge b_i$ fpr $i \in I_1$, $\sum_{j=1}^n a_{ij}x_j = b_i$ for $i \in I_2$, and $x_j \ge 0$ for $j \in J_1$ with $I_1 \cap I_2 = \varnothing$, $I_1 \cup I_2 = I$, $I = \{1,...,m\}$, and $J_n \subset J = \{1,...,n\}$.
Consider $|I| = m$, $|J| = n$, $$\tilde{x} = ((x_i)_{j \in J_1}, (u_j)_{j \in J_2}, (v_j)_{j \in J_2}, (z_j)_{j \in I_1})^\top \in \mathbb{R}^{|J_1| + 2|J_2| + |I_1|}$$ $$\tilde{c} = ((c_j)_{j \in J_1}, (c_j)_{j \in J_2}, (-c_j)_{j \in J_2}, 0, ..., 0)^{\top} \in \mathbb{R}^{|J_1| + 2|J_2| + |I_1|}$$ $$\tilde{b} = ((b_i)_{i \in I_1}, (b_i)_{i \in I_2})^\top \in \mathbb{R}^m$$ $$\tilde{A} = \begin{pmatrix} (a_{ij})_{i \in I_1, j \in J_1} & (a_{ij})_{i \in I_1, j \in J_2} & (-a_{ij})_{i \in I_1, j \in J_2} & -I_{|I_1|} \\ & & & \\ (a_{ij})_{i \in I_2, j \in J_1} & (a_{ij})_{i \in I_2, j \in J_2} & (-a_{ij})_{i \in I_2, j \in J_2} & 0\end{pmatrix}$$ where $J_1$ and $J_2$ are from the dual problem. We wish to minimize $\tilde{c} \tilde{x}$ subject to $\tilde{A}\tilde{x} = \tilde{b}$ and $\tilde{x} \ge 0$. All of this comes from $$x_j \ge 0\;\;\; j\in J_1$$ $$x_j = u_j - v_j \;\;\; j \in J_2$$ $$u_j, v_j \ge 0$$ $$\sum_{j=1}^n a_{ij} x_j = \sum_{j \in J_1} a_{ij} x_j + \sum_{j \in J_2} a_{ij} (u_j - v_j) = \sum_{j \in J_1} a_{ij} x_j + \sum_{j \in J_2} a_{ij} u_j + \sum_{j\in J_2} -a_{ij} v_j$$
Why not just use Lagrange Multipliers and the calculus tools/variations/Lagrange Multipliers we had before for linear programming?
We now find another optimality condition:
Lecture 14
We start with the proof of the Equilibrium Theorem.
The dual constraints for $y_i$ give $$\sum_{i=1}^m y_i a_{ij} \le c_j$$ $$\Leftrightarrow c_j - \sum_{i = 1}^m y_i a_{ij} \ge 0$$ $$\Rightarrow x_j (c_j - \sum_{i = 1}^m y_i a_{ij} \ge 0)$$ for $j$ from 1 to $n$. Summing over $j$, $$\sum_{j=1}^n x_j(c_j - \sum_{i = 1}^m y_i a_{ij}) \ge 0$$ $$\sum_{j=1}^n x_j c_j - \sum_{j=1}^n\sum_{i=1}^m y_i a_{ij} x_j \ge 0$$ $$\sum_{i=1}^m b_i y_i \le \sum_{j=1}^n x_j c_j$$ because $x$ and $y$ are feasible.
If $\sum_{i=1}^m b_i y_i = \sum_{j=1}^m x_j c_j$, rewrite the previous equation. We have: $$\sum_{i=1}^m b_i y_i - \sum_{j=1}^n \sum_{i=1}^m y_i a_{ij} x_j \ge 0$$ $$= \sum_{i=1}^m y_i (b_i - \sum_{j=1}^n a_{ij} x_j) \ge 0$$ Ny the constraints, the last expression should be an equality. Therefore, we can backtrack to get$$\sum_{j=1}^n x_j (c_j - \sum_{i=1}^m y_ia_{ij}) = 0$$ so either $x_j = 0$ or $c_j - \sum_{i=1}^m y_ia_{ij} = 0$.
For the other direction, $$\sum_{j=1}^n x_j c_j - \sum_{j=1}^m \sum_{i=1}^m x_ja_{ij} y_i = 0$$ and the left hand side is at least $$\sum_{i=1}^m b_i y_i - \sum_{j=1}^m \sum_{i=1}^n x_ja_{ij}y_i = \sum_{i=1}^m y_i(b_i - \sum_{j=1}^n a_{ij}x_j) = 0$$ so $b_j - (Ax)_j = 0$.
Basic Solutions
Definition (Basic Solution). A basic solution is a feasible solution with as many zero entries as possible. Let's assume $x$ solves $Ax = b$. If $x \neq 0$, then call the nonzero entries $x_\alpha$, $x_\beta$, $x_\gamma$, .... Also, $A = \begin{pmatrix} a^1 & a^2 & ... & a^n\end{pmatrix}$ where $a^j$ are the columns of $A$.
$Ax = a^\alpha x_\alpha + a^\beta x_\beta + ... = \sum_{j=1}^n a^j x_j = b$. In the case that $b \neq 0$, we call $x$ a basic solution if the columns $a^\alpha, a^\beta,...$ are linearly independent. If $b = 0$, then $x = 0$ is a basic solution.
Lecture 15
- If there is any feasible solution, there is a basic feasible solution.
- If there is any optimal solution, there is a basic optimal solution.
For (1), suppose $x$ is a feasible solution. If there are several feasible solutions, let $x$ be the one with the smallest number of positive components. If $x = 0$, then it is basic by definition. Thus assume $x \neq 0$. Then let $x_\alpha$, $x_\beta$, etc. be the nonzero components of $x$. We have that $Ax = a^\alpha x_\alpha + a^\beta x_\beta + ... = b$ where $a^\alpha$, $a^\beta$, etc. are the columns of $A$. We need to show that $a^\alpha$, $a^\beta$, etc. are linearly independent. Suppose for contradiction that they are not. Then there are $\theta_\alpha$, $\theta_\beta$ such that $\theta_\alpha a^\alpha + \theta_\beta a^\beta + ... = 0$ with at least one $\theta_j \neq 0$. Without loss of generality, assume $\theta_\alpha > 0$. Otherwise multiplpy the equation by $-1$.
Let $\lambda \in \mathbb{R}$ and subtract the previous equation times lambda from the other previous equation to find $$a^\alpha (x_\alpha - \lambda \theta_\alpha) + a^\beta (x_\beta - \lambda \theta+\beta) + ... = b$$ If $|\lambda|$ is small enough, then the new components $(a_\alpha - \theta_\alpha \lambda)$, $(x_\beta - \theta_\beta \lambda)$, etc. are still $\ge 0$.
Choose $\lambda$ as large as we can, while still keeping all components nonnegative. If $\theta_1 \le 0$, then any positive $\lambda$ will be ok. $x_i - \lambda \theta_i > 0$. On the other hand if $\theta_i > 0$, then we must have $\lambda \le \dfrac{x_i}{\theta_i}$, so that $x_i - \lambda \theta_i \ge x_i - \dfrac{x_i}{\theta_i} \theta_i \ge 0$. So we take $\lambda = \min\{\dfrac{x_i}{\theta_i}, x_i > 0, \theta_i > 0 \}$. Let $i = \mu$ be the index where the minimum is achieved, $\lambda = \dfrac{x_\mu}{\theta_\mu}$. Then the $\mu$th component will be $x_\mu - \lambda \theta_\mu = 0$.
$\tilde{x} = (0,...,x_\alpha - \lambda \theta_\alpha, 0, ..., 0, x_\beta - \theta_\beta \lambda,...,x_\mu - \theta_\mu \lambda,...)$ is feasible since $\tilde{x} \ge 0$, $A\tilde{x} = b$. But $\tilde{x}$ has one less positive entry than $x$ because $x_\mu -\theta_\mu \lambda = 0$. That contradicts the assumption. Thus, $x$ must be basic as desired.
We now consider the second part. This part is very similar, so I will leave this out.
Lecture 16
Simplex Method
This was developed by George Dantzig. We will consider the canonical linear programming problem: Find $x \in \mathbb{R}^n$ solving "minimize $c \cdot x$ subject to $Ax = b$, $x \ge 0$ with $A \in \mathbb{R}^{m\times n}, b \in \mathbb{R}^m, c \in \mathbb{R}^n$."
There are two phases to the method:
- Phase I: Find a basic feasible solution to $Ax = b$, $x \ge 0$ or conclude there is none.
- Phase II: Given a basic feasible solution, find an optimal one, or show non exists.
- We assume that $m < n$.
- We assume that the $m$ rows are linearly independent, so that the rank of $A$ is $m$.
- We assume that $b$ is not a linear combination of less than $m$ columns of $A$.
What do these assumptions mean?
- We have more unknowns $x_j$ than constraint equations.
- Means that there are also $m$ linearly independent columns. If we didn't have this assumption, we might have redundant constraint equations or inconsistency.
- Means that a basic $x$ has $m$ nonzero entries (exactly).
Section 1.12 in Franklin shows how to do the simplex method when the non-degeneracy assumptions are not satisfied.
The proof isn't that bad, so I'll leave it out here for myself to try later.
We now go over how to do Phase I. Assume we know how to do Phase II. Assume $b_i \ge 0$ (if not multiplfy that constraint equation by $-1$). We state Phase I as a minimization problem. "Minimize $z_1 + ... + z_m$ subject to $\sum_{j=1}^n a_{ij}x_j + z_i = b_i$ for $i = 1,...,m$, $x_i \ge 0$, $z_j \ge 0$, $i = 1,...,n$, $j = 1,...,m$. $\tilde{x} = (x,z) \in \mathbb{R}^{n+m}$, $\tilde{c} = (0,...,0,1,...,1)$. $\tilde{A} = (A, I_{m\times m})$, $\tilde{b} = b$. A feasible solution is $x_j = 0$ for $j = 1,...,n$, $z_i = b_i$ for $i = 1,...,m$. $\tilde{x} = (0,...,0,b_1,...,b_m)^\top$.
The cost of this solution is $\sum_{i=1}^m b_i$.
Lecture 20
Today I'm not going to take notes here, as I am trying something new.
Lecture $N$
Welp... that didn't really work out, so here we are again.