Eigendecomposition Illustrated

This post explains the basics of the eigendecomposition, with some python code to illustrate a simple example. For more theoretical information I'd recommend the freely available book Mathematics for Machine Learning which provides all the necessary background info.

Basic Theory¶

Eigendecomposition starts with the goal of finding a similar diagonal version of a matrix $A$. We want this diagonalized version of $A$ because diagonal matrices are convenient and efficient to work with!
So a diagonal matrix $D$ that is similar to $A$ should satisfy $D=P^{-1}AP$, in accordance with the definition of matrix similarity.
In other words, we want to find $D$ and $P$ such that $PD=AP$, where $D$ is a diagonal matrix with some scalars $\lambda_1, ... \lambda_n$ on the diagonal. Then we have $PD = [p_1 \lambda_1 | ... | p_n \lambda_n]$, where $p_i$ are the columns of $P$. Similarly, we have $AP = [A p_1 | ... |A p_n]$.
Using these new formulations of $PD$ and $AP$, we get that $[A p_1 | ... |A p_n] = [\lambda_1 p_1 | ... | \lambda_n p_n]$.
Look familiar? This equation implies a set of equalities $A p_i = \lambda_i p_i$ -- the eigenvectors and eigenvalues of $A$!
Indeed, this shows that in order to make such a diagonalization of $A$ we end up using $A$'s eigenvectors and eigenvalues.
This brings us to the canonical form of the eigendecomposition, which is that we can decompose $A$ as $A = PDP^{-1}$.
Let's jump into numpy and visualise this!

In [43]:

import numpy as np
import matplotlib.pyplot as plt

In [3]:

A = np.array([
    [2, 1],
    [1, 3]
])

eigvals, eigvecs = np.linalg.eig(A)
# Let's reorder these so that the eigenvalues are in decreasing order
eigvecs[:, [1,0]] = eigvecs[:, [0,1]]
eigvals[[1,0]] = eigvals[[0, 1]]

In [26]:

# Let's set up some helper code for visualisation
def plot_vec(ax, vec, colour='blue'):
    # Plot the vector as an arrow
    ax.arrow(0, 0, vec[0], vec[1], head_width=0.1, head_length=0.1, fc=colour, ec=colour)
    
# Here's a square of vectors we can use for illustrative purposes
sample_vecs = [np.array([i,j]) for i in np.linspace(-2, 2, 9) for j in np.linspace(-2, 2, 9)]

def format_ax(ax, x=10, y=10):
    ax.set_xlim(-x, x)
    ax.set_ylim(-y, y)
    ax.grid(True)

In [27]:

# Create a figure and axis
fig, ax = plt.subplots(figsize=(5,5))

plot_vec(ax, eigvals[0]*eigvecs[:,0], colour='red')
plot_vec(ax, eigvals[1]*eigvecs[:,1], colour='red')
plot_vec(ax, eigvecs[:,0], colour='blue')
plot_vec(ax, eigvecs[:,1], colour='blue')

format_ax(ax, x=5, y=5)

In [28]:

# Let's visualise the affect that multiplying by A has on a grid of vectors

fig, ax = plt.subplots(figsize=(5,5))

for x in sample_vecs:
    Ax = A @ x
    plot_vec(ax, Ax, colour='red')
    plot_vec(ax, x, colour='blue')
        
format_ax(ax)

In [7]:

# Now let's create our eigendecomposition using the theory from above
P = eigvecs
P_inv = np.linalg.inv(eigvecs) # In this case P = P_inv because P is symmetric with orthogonal columns!
D = np.eye(2) * eigvals

In [29]:

fig, ax = plt.subplots(figsize=(5,5))

for x in sample_vecs:
    PDPx = P @ D @ P_inv @ x
    plot_vec(ax, PDPx, colour='green')
    plot_vec(ax, x, colour='blue')
        
format_ax(ax)

In [44]:

# Let's break this transformation down into its composing parts

fig, axes = plt.subplots(2, 2, figsize=(12, 12))

for x in sample_vecs:
    plot_vec(axes[0,0], x, colour='blue')
    axes[0,0].set_title('x')
    plot_vec(axes[0,1], P @ x, colour='green')
    axes[0,1].set_title('P @ x')
    plot_vec(axes[1,0], D @ P_inv @ x, colour='red')
    axes[1,0].set_title('D @ P_inv @ x')
    plot_vec(axes[1,1], P @ D @ P_inv @ x, colour='orange')
    axes[1,1].set_title('P @ D @ P_inv @ x')

for i in [0,1]:
    for j in [0,1]:
        format_ax(axes[i,j])

Interpreting these plots¶

These plots reveal to us why the eigendecomposition works:

First, the the base of the input is changed with $P^{-1}$ so that it uses the eigenvectors as basis vectors. In this basis, multiplying by $\lambda$ will give the same result as multiplying by $A$, because we know each vector position in terms of the eigenvalue positions!
Next, we multiply by $D$, resulting in the same vectors we would have gotten if we had multiplied by $A$.
Finally, all that remains is to change the basis back to the original basis, which we achieve by inverting the original basis change via $P$.

The fundamental idea is to bring the vector $x$ into a new basis where we know that multiplying by $A$ is the same as multiplying by $D$, and then bring the result back into the original basis, producing the same result as $Ax$.

Written on November 5, 2023