News
My basil finally developed roots! I'm currently reading about quantum mechanics, ferrofluids, and language models.
Blog
[08/09/2024:] Motivating Ladder Operators II
[08/08/2024:] A Cool Way to Garden
[08/08/2024:] Life and Basil Limeade
[08/05/2024:] Motivating Ladder Operators
[08/04/2024:] Migrated to AstroJS
[07/31/2024:] The Classical in the Quantum
Notes
Working on notes on the quantum mechanics, derivatives, and uploading my previous course notes onto this blog!
Projects
Finally started a projects page! I've recently made some nice upgrades to my post component, so it looks pretty clean! ;)
🌊
I'm considering whether or not to continue this project using WebGL or Three.js.
I'm also researching methods for generating the 3D scenes I want for this project automatically.
In the meantime, I've decided to proceed with some preliminary prototypes of the other interactive parts of this project.
Orange Juice
I like orange juice. :)
Mlog
EE A227BT: Convex Optimization
Fall 2024
By Aathreya Kadambi
Expanding on Lectures by Professor Benjamin Recht
This fall I’m thinking of auditing this convex optimization class. According to Professor Recht, these are the problems you can solve, and your computers can solve (assuming some stuff). This class is about models that you can know you are going to be able to solve, and then you acn work on the ones you don’t know, and that’s called research.
Story 1: A Survey
Relevant Lecture: Lecture 1
and same thing. But while these two things are the same mathematically, there is a difference in mindset. The first is like minimizing a cost, whereas the latter is maximizing a reward. Some people are minimizers and others are maximizers.
Sometimes we just a constrant like , and we want to find a solution. Feasibility.
Optimization problems:
- Circuits
- Portfolio
- Lecture hall assignment (minimize the distance the professors have to walk)
- Scheduling classes
- Traffic routes
He then introduced convex sets and convex fnuctions. A convex set is one where every line between two points in the set lies in the set. A convex function is one such that every line between two points on the function lies above the function. We can see a convex set from the convex function which is the set lying above the function. Notice that intersection of convex sets is convex. Let be the convex set corresponding to a convex function. We can make a minimization problem:
I think if you intersect all half-planes containing the set, you get the set, and if you intersect all half-planes containing the set, you get the complement ish of the set. For a finite collection of half-planes, we get a polyhedron: and we write the linear program: We will talk about this, but we will take a different view than the one taken in a linear programming class.
“People will tell you, most problems are not convex, but it’s all a mindset, bro.” - Benjamin Recht.
“I can turn any problem into a convex problem, by like rewriting it in a funny way. But I can’t always solve it.” - Benjamin Recht.
Let’s say we have a nonconvex function. We want to minimize where is not convex. We can minimize over all probability distributions the expected values with respect to of . This problem is linear in , probability distributions are a convex set. But it isn’t easy to solve. “This is one trick, there are a million tricks. But it is not always possible to solve it.”
“I don’t like the name of the class. It should be ‘officially solvable convex problems’.”
It’s just a notation thing, but this notation gives you a convex problem for the nonconvex problem.
Remark. Unfortunately I had to miss two stories because of some prior commitments.
Story 4: Tangent Planes and Optimality
We wish to minimize subject to .
Tangent cone, feasible cone. Some weird letter thingy is the tangent cone:
Remark. One of the hard parts about convex geometry is that in 2D, things are misleading. For example, in 2D, the convex hull of is the square. In higher dimensions, this has corners and it is the hypercube. It has supporting hyperplanes. Now let’s take the convex hull of has corners. But it has supporting hyperplanes. This is despite the same that they look exactly the same in 2D. Basically we’re comparing a cube and a diamond thingy.
What are the set of directions that decrease the cost function?
We study convex optimization because it has a dumb reasoning for global equals local. But if you’re a math nerd, it’s interesting because there are some nice algebraic and geometric ideas here.
By global/local stuff here, does he mean the local stuff is like the tangent plane at that point doesn’t intersect the area where cost decreases, and the global stuff is the global optimality result?
Story 5: Separating hyperplanes
Theorem. If and are nonempty, disjoint, and convex, then and such that for all , for all .
Notice that before, we were trying to find a separating hyperplane between and .
Proof Sketch.
We assume that there is a closest points and between the two sets, with and . Then take the line between them and the hyperplane orthogonal to that line.
Theorem. If is closed and convex, let be the intersection of all half spaces containing . Then .
Story 7: Convex Functions
Definitions for convexity:
- For all and ,
- For all ,
- For all , .
We also have some operations that preserve convexity:
- Nonnegative linear combinations
- Affine composition
- Maximization
- Partial minimization
- Composition with scalar, convex, nondecerasing function
Who has tried the eigenvalue solver? You can put a matrix in that’s postiive definite and get a negative number out. It happens especially as you get rank deficient. It is a real problem and when you’re writing code you have to put safety checks, if you have a negative eigenvalue and its smaller than the largest eigenvalue, you have to treat it as zero and put a safety check to deal with that.
Remark. Dang that’s some really good advice/wisdom ngl.
Notice that least squares, is a convex function.
It should be our goal to reduce all problems to finding the least squares solution.
If is a symmetric matrix, we can define a quadratic function .
Remark. Always put a in front of your quadratics. It makes your derivatives nicer.
You should have the gradients for the loss of the least squares and the gradient of memorized.
Why do we care about the log od a determinant? For a positive definite matrix.
- The log of a determiant of a covariance is apparently entropy of a gaussian or something? Some paper on maximizing the log of the determiant. Why do we want to maximize and not minimize, it is concave.
- I think maybe it also tells us whether or not expands or contracts the space overall or something? Not sure tbh.
Take , ,
Remark. It turns out that . You can find the Hessian of this and do it another way.
Remark. Why were we allwoed to do again? Is it just because of convecity or something?
Let be symmetric and the largest eigenvalue of . We expect this to be convex because it is a maximization of something.
In this example, , and is a linear function of , so this must be convex. This is a particularly interesting example because it is not differentiable.
Functions where all sublevel sets are intervals can be minimized with the bracket search. But you have to be able t ofind points in the sublevel sets. The thing about this is that it is simpler in 1D than it is in higher dimensions. I think he called this quasiconvex?
Ceiling function is quasiconvex. There’s a function called with , also known as the norm, tells you the number of nonzero components of .
If a function is log concave, then when you take the logarithm, it is concave. The function is log concave. is also, but one of hte most important is the Gaussian probability density function.
If you have a constraint, , then if you have a log concave function.
Cones
He wants to talk about convex cones. This,
s.t.
these problems are called semidefinite programming problems (SDP problems). Today we’ll see how many problems can be written like this.
Let be a convex cone. If for all , , . is called an orthant or nonnegative orthant.
The Orthant is self-dual, which follows simply. The second order cone also is self dual, that follows from Cauchy-Schwarz. The SDP problem is also self dual or something.
You should show that for yourself. If you do something, you get:
Take two PSD matrices and dot product them together and you always get a positive number, taht’s the cool thing.
This is nice because if you can solve one program, you can solve another. This will end up leading to the primal dual ideas. The second order cone actually encodes quadratic programming, and the positive semidefinite cone encodes everything.
Remark. A piece of motivation is that we are heading towards optimality of . As I was typing this he decided to erase it, and someone loudly whispered “no!“. It was really funny.
Convex Q.P.
minimize subject to .
minimize … subject to ,
Don’t solve least squares using gradient descent, but we can solve least squares via a second order cone problem.
minimize subject to . By monotonicity, this is the same as just saying such that .
Stochastic Optimization
Minimize such that (in other words, we’re picking out of a hat, and we want that the probability of the condition being met is decently high). A version of markowitz forcasting or something. Finance people like this.
Suppose . .
or something. Gaussican CDF. We can the minimize such that . This is a very frequnetist problem because we’re setting a sort of confidence level.
Robust optimization
Minimize subject to for all . . . What is the relationship between robust and stochastic problems? It is actually a very contentious point in the optimization community because the robust people say they can solve all the stochastic people’s problems. Recht low key sides with the robusters (get it, robust, plust they bust people’s problems so they’re busters).
The tricks on the robust side are still going to work even if we change up , and we get different formulations that still solve the stochastic ppls problems or something? You just have to worry about what sets you can describe well using convex sets rather than deal with stochasticity?
Duality: Concave Maximization and Convex Optimization
Fact 1: Solutions to the primal are always at least the solutions to the dual.
Branch and bound is a solution method that makes use of duality.
He thinks algebra is underappreciated. 🤨 Idk about that one. Maybe in CS.
Dualizing is somehow a projection into the space convex problems? Maybe? because dual gives you a convecx problem, and dual of the dual gives you a convexification of your original problem.
Let’s look at this: is symmetric, maximize subject to .
Or maybe, looking at a dual, minimize subject to . We know how to do this. That’s just of . But then if we put the corresponding eigenvector into the original problem, we immediately find , so we have strong duality and we’ve solved that nonconvex problem!