Measurement-Efficient Bayesian Inference
You can find a video presentation of these notes here: https://www.youtube.com/watch?v=QhXiyQYkc80
Inverse Problems
Problems of the form: \[ y=f(x,n) \]
\(y\) is an observed measurement \(f\) is a measurement model. \(x\) is the target signal \(n\) is measurement noise
We would like to know \(x\), but \(f\) is not invertible, hence the name ‘inverse problem’.
We like linear inverse problems: \[ y = Hx + n \] Where \(H \in \mathbb{R}^{N\times M}\) is not invertible because it has low rank, so we can’t simply compute \(x \approx H^{-1}y\).
Further, because of its low rank, there are typically multiple values of \(x\) that solve \(y = Hx\), i.e. \(H\) is many-to-one.
For example:
= (mask * fish) + noise measured_fish
Bayesian Inversion
In principle, any value for \(x\) in the masked region solves \(y=Hx +n\). But we want our algorithm to inpaint this region with “plausible” solutions. Bayes’ Theorem lets us calculate exactly that: \[ p(x \mid y) = \frac{p(y \mid x)p(x)}{p(y)} \] \[ \text{with, e.g.:} \quad p(y \mid x) = \mathcal{N}(y; Hx, \sigma_y^2) \] Now sampling \(x \sim p(x \mid y)\) should give plausible solutions – in particular those that have high probability under both the likelihood and the prior.
We can imagine this in terms of ‘slicing’ from the joint space only the region where \(Y=y\), and seeing which values for \(x\) co-occur with that.
Our prior then serves to constrain the image space, making our problem more tractable (less many-to-one).
Measurement-Efficient Bayesian Inference
Intuition: MNIST Where would you sample? Why?
Case Study: Accelerated MRI Forward model for MRI: \[ y = U\mathcal{F}(x) + n \] \(U\) is a subsampling mask \(\mathcal{F}\) is the Fourier transform
Imagine we only have 10 minutes to make our scan – which lines should we choose to get the most information about the ‘true’ image?
Another way of asking this question, is which measurements \(y\) would result in us being least uncertain about \(x\)?
Fortunately, we can measure this uncertainty using information entropy. In particular, we care about the posterior entropy, \(H(x \mid y)\). Our goal then becomes:
\[ U^* = \underset{U}{\text{arg min }} H(x \mid y) \quad \text{s.t. } U \text{ contains } K \text{ scan lines.} \]
Rather than trying to optimize all lines at once, we can do this iteratively, so that we can use the information we gather along the way to make better decisions. Note that the order doesn’t affect information gain since the Bayesian updates are commutative.
Along with some other tricks (see paper), this gives a new objective:
\[ \text{scan line}_t^* = \underset{\text{scan line}_t}{\text{arg max }} [H(y_t \mid U_t, y_{0:t-1})] \]
The measurements that we expect to be high-entropy are the ones we know least about, and therefore the ones that will give us the most new information.
How can we compute this entropy term? The option we chose:
- Predict \(N\) possible values for \(y_t\) given \(y_{0:t-1}\) (simulation).
- Use the set \(\{y_t^{(i)}\}_{i=0}^{N}\) of simulated possibilities to approximate the distribution as a Gaussian Mixture Model \(p(y_t \mid U_t, y_{0:t-1}) \approx \sum_i w_i \mathcal{N}(y_t^{(i)}, \sigma_y^2)\).
- We can then approximate the entropy of this GMM as a function of the pairwise L2 norms between each pair of simulated \(y_t^{(i)}\) values.
Diffusion Models
Noising; forward diffusion process