Eigendecomposition Illustrated
This post explains the basics of the eigendecomposition, with some python code to illustrate a simple example. For more theoretical information I'd recommend the freely available book Mathematics for Machine Learning which provides all the necessary background info.
Basic Theory¶
- Eigendecomposition starts with the goal of finding a similar diagonal version of a matrix $A$. We want this diagonalized version of $A$ because diagonal matrices are convenient and efficient to work with!
- So a diagonal matrix $D$ that is similar to $A$ should satisfy $D=P^{-1}AP$, in accordance with the definition of matrix similarity.
- In other words, we want to find $D$ and $P$ such that $PD=AP$, where $D$ is a diagonal matrix with some scalars $\lambda_1, ... \lambda_n$ on the diagonal. Then we have $PD = [p_1 \lambda_1 | ... | p_n \lambda_n]$, where $p_i$ are the columns of $P$. Similarly, we have $AP = [A p_1 | ... |A p_n]$.
- Using these new formulations of $PD$ and $AP$, we get that $[A p_1 | ... |A p_n] = [\lambda_1 p_1 | ... | \lambda_n p_n]$.
- Look familiar? This equation implies a set of equalities $A p_i = \lambda_i p_i$ -- the eigenvectors and eigenvalues of $A$!
- Indeed, this shows that in order to make such a diagonalization of $A$ we end up using $A$'s eigenvectors and eigenvalues.
- This brings us to the canonical form of the eigendecomposition, which is that we can decompose $A$ as $A = PDP^{-1}$.
- Let's jump into numpy and visualise this!
In [43]:
import numpy as np
import matplotlib.pyplot as plt
In [3]:
A = np.array([
[2, 1],
[1, 3]
])
eigvals, eigvecs = np.linalg.eig(A)
# Let's reorder these so that the eigenvalues are in decreasing order
eigvecs[:, [1,0]] = eigvecs[:, [0,1]]
eigvals[[1,0]] = eigvals[[0, 1]]
In [26]:
# Let's set up some helper code for visualisation
def plot_vec(ax, vec, colour='blue'):
# Plot the vector as an arrow
ax.arrow(0, 0, vec[0], vec[1], head_width=0.1, head_length=0.1, fc=colour, ec=colour)
# Here's a square of vectors we can use for illustrative purposes
sample_vecs = [np.array([i,j]) for i in np.linspace(-2, 2, 9) for j in np.linspace(-2, 2, 9)]
def format_ax(ax, x=10, y=10):
ax.set_xlim(-x, x)
ax.set_ylim(-y, y)
ax.grid(True)
In [27]:
# Create a figure and axis
fig, ax = plt.subplots(figsize=(5,5))
plot_vec(ax, eigvals[0]*eigvecs[:,0], colour='red')
plot_vec(ax, eigvals[1]*eigvecs[:,1], colour='red')
plot_vec(ax, eigvecs[:,0], colour='blue')
plot_vec(ax, eigvecs[:,1], colour='blue')
format_ax(ax, x=5, y=5)
In [28]:
# Let's visualise the affect that multiplying by A has on a grid of vectors
fig, ax = plt.subplots(figsize=(5,5))
for x in sample_vecs:
Ax = A @ x
plot_vec(ax, Ax, colour='red')
plot_vec(ax, x, colour='blue')
format_ax(ax)
In [7]:
# Now let's create our eigendecomposition using the theory from above
P = eigvecs
P_inv = np.linalg.inv(eigvecs) # In this case P = P_inv because P is symmetric with orthogonal columns!
D = np.eye(2) * eigvals
In [29]:
fig, ax = plt.subplots(figsize=(5,5))
for x in sample_vecs:
PDPx = P @ D @ P_inv @ x
plot_vec(ax, PDPx, colour='green')
plot_vec(ax, x, colour='blue')
format_ax(ax)
In [44]:
# Let's break this transformation down into its composing parts
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
for x in sample_vecs:
plot_vec(axes[0,0], x, colour='blue')
axes[0,0].set_title('x')
plot_vec(axes[0,1], P @ x, colour='green')
axes[0,1].set_title('P @ x')
plot_vec(axes[1,0], D @ P_inv @ x, colour='red')
axes[1,0].set_title('D @ P_inv @ x')
plot_vec(axes[1,1], P @ D @ P_inv @ x, colour='orange')
axes[1,1].set_title('P @ D @ P_inv @ x')
for i in [0,1]:
for j in [0,1]:
format_ax(axes[i,j])
Interpreting these plots¶
These plots reveal to us why the eigendecomposition works:
- First, the the base of the input is changed with $P^{-1}$ so that it uses the eigenvectors as basis vectors. In this basis, multiplying by $\lambda$ will give the same result as multiplying by $A$, because we know each vector position in terms of the eigenvalue positions!
- Next, we multiply by $D$, resulting in the same vectors we would have gotten if we had multiplied by $A$.
- Finally, all that remains is to change the basis back to the original basis, which we achieve by inverting the original basis change via $P$.
The fundamental idea is to bring the vector $x$ into a new basis where we know that multiplying by $A$ is the same as multiplying by $D$, and then bring the result back into the original basis, producing the same result as $Ax$.
Written on November 5, 2023