News

My basil finally developed roots! I'm currently reading about quantum mechanics, ferrofluids, and language models.

Blog

[08/09/2024:] Motivating Ladder Operators II
[08/08/2024:] A Cool Way to Garden
[08/08/2024:] Life and Basil Limeade
[08/05/2024:] Motivating Ladder Operators
[08/04/2024:] Migrated to AstroJS
[07/31/2024:] The Classical in the Quantum

Notes

Working on notes on the quantum mechanics, derivatives, and uploading my previous course notes onto this blog!

Projects

Finally started a projects page! I've recently made some nice upgrades to my post component, so it looks pretty clean! ;)

🌊

I'm considering whether or not to continue this project using WebGL or Three.js.

I'm also researching methods for generating the 3D scenes I want for this project automatically.

In the meantime, I've decided to proceed with some preliminary prototypes of the other interactive parts of this project.

Orange Juice

I like orange juice. :)

Mlog

EE A227BT: Convex Optimization

Fall 2024
By Aathreya Kadambi
Expanding on Lectures by Professor Benjamin Recht

This fall I’m thinking of auditing this convex optimization class. According to Professor Recht, these are the problems you can solve, and your computers can solve (assuming some stuff). This class is about models that you can know you are going to be able to solve, and then you acn work on the ones you don’t know, and that’s called research.

Story 1: A Survey

Relevant Lecture: Lecture 1

$\min f_0(x)$ $\text{subject to }x \in \mathcal{C}$ $x \in \R^n, \mathcal{C} \subseteq \R^n$ $f_0 : \R^n \rightarrow \R$

$\max r(x)$ $\text{subject to }x \in \mathcal{C}$ and same thing. But while these two things are the same mathematically, there is a difference in mindset. The first is like minimizing a cost, whereas the latter is maximizing a reward. Some people are minimizers and others are maximizers.

Sometimes we just a constrant like $f_0(x) = 0$ , and we want to find a solution. Feasibility.

Optimization problems:

Circuits
Portfolio
Lecture hall assignment (minimize the distance the professors have to walk)
Scheduling classes
Traffic routes

He then introduced convex sets and convex fnuctions. A convex set is one where every line between two points in the set lies in the set. A convex function is one such that every line between two points on the function lies above the function. We can see a convex set from the convex function which is the set lying above the function. Notice that intersection of convex sets is convex. Let $\mathcal{E}_{f_0}$ be the convex set corresponding to a convex function. We can make a minimization problem: $\min t$ $\text{ subject to }x \in\mathcal{C}$ $(x,t) \in \mathcal{E}_{f_0}$

I think if you intersect all half-planes containing the set, you get the set, and if you intersect all half-planes containing the set, you get the complement ish of the set. For a finite collection of half-planes, we get a polyhedron: $Ax \le b$ and we write the linear program: $\min c^\top x$ $\text{subject to }Ax \le b$ We will talk about this, but we will take a different view than the one taken in a linear programming class.

“People will tell you, most problems are not convex, but it’s all a mindset, bro.” - Benjamin Recht.

“I can turn any problem into a convex problem, by like rewriting it in a funny way. But I can’t always solve it.” - Benjamin Recht.

Let’s say we have a nonconvex function. We want to minimize $g(x)$ where $g$ is not convex. We can minimize over all probability distributions the expected values with respect to $P$ of $g(x)$ . This problem is linear in $p$ , probability distributions are a convex set. But it isn’t easy to solve. “This is one trick, there are a million tricks. But it is not always possible to solve it.”

“I don’t like the name of the class. It should be ‘officially solvable convex problems’.”

It’s just a notation thing, but this notation gives you a convex problem for the nonconvex problem.

Remark. Unfortunately I had to miss two stories because of some prior commitments.

Story 4: Tangent Planes and Optimality

We wish to minimize $x_0$ subject to $x \in \mathcal{C} \subseteq \R^d$ .

Tangent cone, feasible cone. Some weird letter thingy is the tangent cone:

\mathscr{T}_{x_{*}} = \{d : \exists \alpha > 0 \text{ w/ } x_{*} + \alpha d \in \mathcal{C}\}

Remark. One of the hard parts about convex geometry is that in 2D, things are misleading. For example, in 2D, the convex hull of $\{x_i = \pm 1\}$ is the square. In higher dimensions, this has $2^n$ corners and it is the hypercube. It has $2n$ supporting hyperplanes. Now let’s take the convex hull of $\{\pm e_i\}$ has $2n$ corners. But it has $2^n$ supporting hyperplanes. This is despite the same that they look exactly the same in 2D. Basically we’re comparing a cube and a diamond thingy.

What are the set of directions that decrease the cost function?

\mathscr{D} = \{ d : d_0 < 0 \}

since any point with

d_0 < 0

decreases the cost since we are minimizing

x_0

. So

x_*

is optimal if

\mathscr{D} \cap \mathscr{T}_{x_*} = \varnothing

. A way to prove this is to show there is a function

h : \R^d \rightarrow \R

h(x) > 0

\mathscr{D}

and

h(x) \le 0

\mathscr{T}

We study convex optimization because it has a dumb reasoning for global equals local. But if you’re a math nerd, it’s interesting because there are some nice algebraic and geometric ideas here.

By global/local stuff here, does he mean the local stuff is like the tangent plane at that point doesn’t intersect the area where cost decreases, and the global stuff is the global optimality result?

Story 5: Separating hyperplanes

Theorem. If $\mathcal{C}_1$ and $\mathcal{C}_2$ are nonempty, disjoint, and convex, then $\exists a \in \R^n$ and $b \in \R$ such that $a x \le b$ for all $x \in \mathcal{C}_1$ , $a^\top z \ge b$ for all $z \in \mathcal{C}_z$ .

Notice that before, we were trying to find a separating hyperplane between $\mathscr{D}$ and $\mathscr{T}$ .

Proof Sketch.

We assume that there is a closest points $u$ and $v$ between the two sets, with $u \in \mathcal{C}_1$ and $v \in \mathcal{C}_2$ . Then take the line between them and the hyperplane orthogonal to that line.

Theorem. If $\mathcal{C}$ is closed and convex, let $S$ be the intersection of all half spaces containing $\mathcal{C}$ . Then $\mathcal{C} = S$ .

Story 7: Convex Functions

Definitions for convexity:

For all $x, z \in \text{dom }f$ and $\theta \in [0,1]$ , $f((1-\theta)x + \theta z)= (1-\theta)f(x) + \theta f(z)$
For all $x,z \in \text{dom }f$ , $f(z) \ge f(x) + \nabla f(x)^\top (z-x)$
For all $x \in \text{dom }f$ , $\nabla_z(f)x) \ge 0$ .

We also have some operations that preserve convexity:

Nonnegative linear combinations
Affine composition $f(Ax + b)$
Maximization $\max_{i \in I}f_i(x)$
Partial minimization $\min_{z}f(x,z)$
Composition with scalar, convex, nondecerasing function

Who has tried the eigenvalue solver? You can put a matrix in that’s postiive definite and get a negative number out. It happens especially as you get rank deficient. It is a real problem and when you’re writing code you have to put safety checks, if you have a negative eigenvalue and its $10^{12}$ smaller than the largest eigenvalue, you have to treat it as zero and put a safety check to deal with that.

Remark. Dang that’s some really good advice/wisdom ngl.

Notice that least squares, $\frac{1}{2n} \sum_{i=1}^n (a^\top x - b)^2$ is a convex function.

It should be our goal to reduce all problems to finding the least squares solution.

If $A$ is a symmetric matrix, we can define a quadratic function $f(x) = \frac{1}{2}x^\top Ax$ .

Remark. Always put a $\frac{1}{2}$ in front of your quadratics. It makes your derivatives nicer.

You should have the gradients for the loss of the least squares and the gradient of $q(x) = \frac{1}{2}x^\top A x + b^\topx + c$ memorized.

Why do we care about the log od a determinant? For a positive definite matrix.

The log of a determiant of a covariance is apparently entropy of a gaussian or something? Some paper on maximizing the log of the determiant. Why do we want to maximize and not minimize, it is concave.
I think maybe it also tells us whether or not $X$ expands or contracts the space overall or something? Not sure tbh.

Take $X = Z + tV$ , $t \in \R$ ,

\log \det (Z + tV) = \log \det (Z^{1/2} (I + tZ^{-1/2} VZ^{-1/2})Z^{1/2}) = \log \det Z + \log \det (I + tZ^{-1/2}VZ^{-1/2})

using the property that

\det (XW) = \det(X) \det (W)

. Now if we let

\mu_i

be the eigenvalues of

Z^{-1/2}VZ^{-1/2}

, the eigenvalues of

I + tZ^{-1/2}VZ^{-1/2}

are

1 + t\mu_i

Remark. It turns out that $\nabla \log \det X = X^{-1}$ . You can find the Hessian of this and do it another way.

Remark. Why were we allwoed to do $X = Z + tV$ again? Is it just because of convecity or something?

Let $A$ be symmetric and $\lambda(A)$ the largest eigenvalue of $A$ . We expect this to be convex because it is a maximization of something.

In this example, $\lambda(A) = \max_{\|x\| = 1} x^\top A x$ , and $x^\top A x$ is a linear function of $A$ , so this must be convex. This is a particularly interesting example because it is not differentiable.

Functions where all sublevel sets are intervals can be minimized with the bracket search. But you have to be able t ofind points in the sublevel sets. The thing about this is that it is simpler in 1D than it is in higher dimensions. I think he called this quasiconvex?

Ceiling function is quasiconvex. There’s a function called $\text{card}(x)$ with $x \in \R^n$ , also known as the $L^0$ norm, tells you the number of nonzero components of $x$ .

If a function is log concave, then when you take the logarithm, it is concave. The function $x$ is log concave. $e^x$ is also, but one of hte most important is the Gaussian probability density function.

If you have a constraint, $f(x) \ge \alpha$ , then $\log f(x) \ge \log \alpha$ if you have a log concave function.

Cones

He wants to talk about convex cones. This,

$\min c^\top x$ s.t. $A_0 + \sum_{i=1}^n A_i x_i \ge 0$

these problems are called semidefinite programming problems (SDP problems). Today we’ll see how many problems can be written like this.

Let $K$ be a convex cone. If for all $x, z \in K$ , $\alpha x +\beta z \in K$ , $\alpha, \beta \in \R_+$ . $R_+^n$ is called an orthant or nonnegative orthant.

The Orthant is self-dual, which follows simply. The second order cone also is self dual, that follows from Cauchy-Schwarz. The SDP problem is also self dual or something.

$X, Z \ge 0$ $X = \sum_{i=1}^n \lambda_i u_i u_i^\top$ $Z = \sum_{=1}^n \mu_i v_i v_i^\top$

You should show that $\sum_{i,j} X_{ij} Z_{ij}$ for yourself. If you do something, you get:

= \sum_{i=1}^n \sum_{j=1}^n \lambda_i \mu_j \text{Tr}(u_iu_i^\top v_j v_j^\top)

Take two PSD matrices and dot product them together and you always get a positive number, taht’s the cool thing.

This is nice because if you can solve one program, you can solve another. This will end up leading to the primal dual ideas. The second order cone actually encodes quadratic programming, and the positive semidefinite cone encodes everything.

Remark. A piece of motivation is that we are heading towards optimality of $\langle \nabla f, z - x\rangle \ge 0$ . As I was typing this he decided to erase it, and someone loudly whispered “no!“. It was really funny.

Convex Q.P.

minimize $\frac{1}{2}x^\top Q x + p^\top x$ subject to $x \in \Omega$ .

minimize $t^2$ … subject to $\frac{1}{2}x^\top Q x + p^\top x \le t^2$ , $x \in \Omega$

$\frac{1}{2}(x+Q^{-1}p)^\top Q (x+ Q^{-1}p)\le t^2$ $\|Q^{1/2} x + Q^{-1/2}p \| \le t$

Don’t solve least squares using gradient descent, but we can solve least squares via a second order cone problem.

minimize $t^2$ subject to $\|Ax - b\|^2 \le t^2$ . By monotonicity, this is the same as just saying $\min t$ such that $\|Ax - b\| \le t$ .

Stochastic Optimization

Minimize $c^\top x$ such that $\text{Pr}(a^\top x \le b) \ge \eta$ (in other words, we’re picking $a$ out of a hat, and we want that the probability of the condition being met is decently high). A version of markowitz forcasting or something. Finance people like this.

Suppose $a \in \mathcal{N}(\overline{a}, \Sigma)$ . $a^\top x \sim \mathcal{N}(\overline{a}x, x^\top \Sigma x)$ .

$\text{Pr}(a^\top x \le b) = \Phi(\frac{b-\overline{a^\top}x}{\|\Sigma x\|_i})$ or something. Gaussican CDF. We can the minimize $c^\top x$ such that $\overline{a}^\top x + \Phi^{-1}(\eta)\|\Sigma^{1/2}x\|_i \le b$ . This is a very frequnetist problem because we’re setting a sort of confidence level.

Robust optimization

Minimize $c^\top x$ subject to $a^\top x \le b$ for all $a \in \mathcal{E}$ . $\mathcal{E} = \{\overline{a} + Pu : \|u \| \le 1\}$ . $P \ge 0$ . What is the relationship between robust and stochastic problems? It is actually a very contentious point in the optimization community because the robust people say they can solve all the stochastic people’s problems. Recht low key sides with the robusters (get it, robust, plust they bust people’s problems so they’re busters).

The tricks on the robust side are still going to work even if we change up $\mathcal{E}$ , and we get different formulations that still solve the stochastic ppls problems or something? You just have to worry about what sets you can describe well using convex sets rather than deal with stochasticity?

Duality: Concave Maximization and Convex Optimization

Fact 1: Solutions to the primal are always at least the solutions to the dual.

Branch and bound is a solution method that makes use of duality.

He thinks algebra is underappreciated. 🤨 Idk about that one. Maybe in CS.

Dualizing is somehow a projection into the space convex problems? Maybe? because dual gives you a convecx problem, and dual of the dual gives you a convexification of your original problem.

Let’s look at this: $P$ is symmetric, maximize $x^\top P x$ subject to $\|x\|_2 = 1$ .

Or maybe, looking at a dual, minimize $\lambda$ subject to $\lambda I - P \ge 0$ . We know how to do this. That’s just $\lambda_{\max}$ of $P$ . But then if we put $x$ the corresponding eigenvector into the original problem, we immediately find $\lambda_{\max}$ , so we have strong duality and we’ve solved that nonconvex problem!

Lemonade

A Sweet Homemade Drink Made in Many Parts of the World