Bayesian Methods
March 2026
Aathreya Kadambi
The Bayesian Update
The main idea behind Bayesian inference is Bayes’ rule:
- : observed data,
- : modeling parameters/the underlying universe state/something else depending on your philosophy,
- : the posterior distribution of universe states given observation of data ,
- : a prior belief about the distribution of universe states before observing ,
- : the likelihood of obesrving some data given the universe state being ,
- : the evidence for data .
The evidence term is usually computed by integrating over the joint distribution , which is computationally expensive. As such, we often prefer the form:
According to Andrew Gelman’s book (and Professor Alexander Strang whose course I took), there are three important pieces to Bayesian statistics:
- Model Specification: This is a specification of what distributions will be used for the prior and likelihood.
- Bayesian Update: This is computing or approximating the posterior given by the Bayesian update.
- Model Evaluation: This is the evaluation of the model and interpretation of the results.
Each of these broad questions sits on top of large bodies of research, analysis, and thought. I hope to explore some of that in these notes.
In my opinion, Bayesian statistics is one of the most philosophically pleasing subjects one can study in life because it seems to give a very casual and stable interpretation of the absence of meaning or objective truth. It also explains several phenomena that people might initially view as signs of inexplicable complexity. It is also very flexible, allowing the user to impose their own ideas or truths while also giving enough room for the data to tell its own story.
Large Sample Theory and Convergence: The Big Data Case [Model Sepcficiation]
Vibe. In the big data case (when we have a lot of data ), the power of Bayesian inference comes from convergence: the modeling paramters will converge to the parameters which minimize some notion of distributional distance to true phenomenon underlying the data.
We will make this precise with several theorems below.
Importance of Prior: The Small Data Case [Model Specification]
Vibe. In the small data case (when we don’t have much data ), the power of Bayesian inference comes from the power of the user to specify an informative prior. A prior can be thought of as a model fit on virtual data points, with stronger priors corresponding to more virtual data.
We will again make this precise with several theorems below.
