Ojaswi Athghara | Eigenvalues, PCA, and Matrix Decomposition: Advanced Linear Algebra for ML

Eigenvalues, PCA, and Matrix Decomposition: Advanced Linear Algebra for ML

The Moment Eigenvalues Clicked for Me

I was staring at a dataset with 100 features wondering: "How do I visualize this?" Then I learned about PCA (Principal Component Analysis).

PCA magically reduced my 100 features to 2, and I could finally plot my data. But how? The secret was eigenvalues and eigenvectors—concepts that seemed abstract until I saw them compress data while keeping the important patterns!

This guide will demystify these advanced topics and show you how they power real ML algorithms.

Eigenvalues and Eigenvectors: The Matrix's DNA

What Are They?

For a matrix A and vector v:

Eigenvector: A direction that doesn't change when A is applied (only scales)
Eigenvalue (λ): How much it scales

Equation: Av = λv

Think of it like this: Most vectors get rotated and scaled by a matrix. But eigenvectors? They only get scaled!

import numpy as np

# Simple 2×2 matrix
A = np.array([
    [4, -2],
    [1,  1]
])

# Find eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Matrix A:")
print(A)
print(f"\nEigenvalues: {eigenvalues}")
print(f"\nEigenvectors (as columns):")
print(eigenvectors)

Verifying the Eigenvalue Equation

Let's check: Does Av = λv hold?

# Take first eigenvector
v1 = eigenvectors[:, 0]
lambda1 = eigenvalues[0]

# Left side: A @ v
left_side = A @ v1
# Right side: λ * v
right_side = lambda1 * v1

print(f"Eigenvector v1: {v1}")
print(f"Eigenvalue λ1: {lambda1:.4f}")
print(f"\nA @ v1 = {left_side}")
print(f"λ1 * v1 = {right_side}")
print(f"Equal? {np.allclose(left_side, right_side)}")

Output shows they match! This proves v1 is indeed an eigenvector.

Why Eigenvalues Matter in ML

Eigenvalues reveal:

Variance direction: Where data spreads most (PCA)
System stability: Whether iterative algorithms converge
Data patterns: Hidden structure in your dataset

Larger eigenvalue = more important direction!

Eigenvalue Decomposition: Breaking Down Matrices

The Decomposition Formula

For a square matrix A with n independent eigenvectors:

A = QΛQ⁻¹

Where:

Q: Matrix with eigenvectors as columns
Λ: Diagonal matrix with eigenvalues
Q⁻¹: Inverse of Q

# Eigenvalue decomposition
A = np.array([[3, 1], [1, 3]])

eigenvalues, eigenvectors = np.linalg.eig(A)

# Build the components
Q = eigenvectors
Lambda = np.diag(eigenvalues)
Q_inv = np.linalg.inv(Q)

print("Original matrix A:")
print(A)
print("\nQ (eigenvectors):")
print(Q)
print("\nΛ (eigenvalues on diagonal):")
print(Lambda)
print("\nQ⁻¹:")
print(Q_inv)

# Reconstruct: A = Q @ Λ @ Q⁻¹
A_reconstructed = Q @ Lambda @ Q_inv

print("\nReconstructed A:")
print(A_reconstructed)
print(f"Matches original? {np.allclose(A, A_reconstructed)}")

Why This Matters: Decomposition reveals the "true nature" of a transformation!

Singular Value Decomposition (SVD): The Ultimate Tool

SVD Works on ANY Matrix!

Unlike eigenvalue decomposition (only square matrices), SVD works on any matrix (even non-square).

A = UΣVᵀ

Where:

U: Left singular vectors (m×m)
Σ: Singular values diagonal (m×n)
Vᵀ: Right singular vectors transposed (n×n)

# SVD on non-square matrix
A = np.array([
    [3, 1, 1],
    [1, 3, 1]
])

print(f"Original matrix (2×3):")
print(A)

# Perform SVD
U, singular_values, Vt = np.linalg.svd(A)

print(f"\nU shape: {U.shape}")
print(f"Singular values: {singular_values}")
print(f"Vt shape: {Vt.shape}")

# Create Sigma matrix (same shape as A)
Sigma = np.zeros_like(A, dtype=float)
np.fill_diagonal(Sigma, singular_values)

print(f"\nΣ matrix:")
print(Sigma)

# Reconstruct: A = U @ Σ @ Vᵀ
A_reconstructed = U @ Sigma @ Vt

print(f"\nReconstructed A:")
print(A_reconstructed)
print(f"Matches? {np.allclose(A, A_reconstructed)}")

What Are Singular Values?

Singular values tell you how important each component is:

Largest singular value: Most important pattern
Smallest singular value: Least important (often noise)

Key insight: You can often drop small singular values and keep 95%+ of information!

PCA: Dimensionality Reduction in Action

What is PCA?

Principal Component Analysis finds the directions (principal components) of maximum variance in your data. It's like finding the "best view" of your data.

Steps:

Center the data (subtract mean)
Compute covariance matrix
Find eigenvalues & eigenvectors
Sort by eigenvalues (largest first)
Project data onto top k eigenvectors

Implementing PCA from Scratch

Let's reduce 3D data to 2D!

# Create sample dataset (5 samples, 3 features)
data = np.array([
    [2.5, 2.4, 1.0],
    [0.5, 0.7, 0.2],
    [2.2, 2.9, 1.5],
    [1.9, 2.2, 1.2],
    [3.1, 3.0, 1.8]
])

print("Original data (5 samples × 3 features):")
print(data)
print(f"Shape: {data.shape}")

# Step 1: Center the data
mean = np.mean(data, axis=0)
data_centered = data - mean

print(f"\nMean: {mean}")
print("Centered data:")
print(data_centered)

# Step 2: Covariance matrix
cov_matrix = np.cov(data_centered.T)
print(f"\nCovariance matrix:")
print(cov_matrix)

# Step 3: Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Step 4: Sort by eigenvalues (descending)
sorted_idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_idx]
eigenvectors = eigenvectors[:, sorted_idx]

print(f"\nEigenvalues (sorted): {eigenvalues}")
print("Principal components (eigenvectors):")
print(eigenvectors)

# Step 5: Calculate explained variance
explained_variance = eigenvalues / np.sum(eigenvalues)
print("\nExplained variance ratio:")
for i, var in enumerate(explained_variance):
    print(f"  PC{i+1}: {var*100:.2f}%")

cumulative = np.cumsum(explained_variance)
print(f"Cumulative: {cumulative}")

# Step 6: Project onto first 2 principal components
n_components = 2
principal_components = eigenvectors[:, :n_components]
data_pca = data_centered @ principal_components

print(f"\nReduced data (5 samples × 2 components):")
print(data_pca)
print(f"Shape: {data_pca.shape}")
print(f"\nPreserved {cumulative[n_components-1]*100:.1f}% of variance!")

Understanding PCA Results

Explained variance tells you how much information each component captures:

PC1: 96% → Most data variation in this direction
PC2: 3% → Second most variation
PC3: 1% → Minimal variation (can drop!)

By keeping just PC1 and PC2, we reduced dimensions from 3 to 2 while keeping ~99% of information!

Real-World Application: Image Compression with SVD

Images are matrices of pixels. SVD can compress them!

# Create a simple "image" (30×30 matrix)
x = np.linspace(0, 10, 30)
y = np.linspace(0, 10, 30)
X, Y = np.meshgrid(x, y)

# Generate pattern
image = np.sin(X) * np.cos(Y) + 0.5 * np.sin(2*X)
image += np.random.randn(30, 30) * 0.1  # Add noise

print(f"Original image shape: {image.shape}")
print(f"Original storage: {image.size} values")

# Perform SVD
U, s, Vt = np.linalg.svd(image)

print(f"\nNumber of singular values: {len(s)}")
print(f"Top 5 singular values: {s[:5]}")

# Compress by keeping only k largest singular values
def compress_image(U, s, Vt, k):
    """Reconstruct image using only k components"""
    U_k = U[:, :k]
    s_k = s[:k]
    Vt_k = Vt[:k, :]
    return U_k @ np.diag(s_k) @ Vt_k

# Try different compression levels
for k in [2, 5, 10, 15]:
    compressed = compress_image(U, s, Vt, k)

    # Calculate compression ratio and error
    original_storage = image.shape[0] * image.shape[1]
    compressed_storage = k * (image.shape[0] + image.shape[1] + 1)
    ratio = (1 - compressed_storage/original_storage) * 100

    error = np.linalg.norm(image - compressed) / np.linalg.norm(image)

    print(f"\nk={k}:")
    print(f"  Storage: {compressed_storage} values ({ratio:.1f}% reduction)")
    print(f"  Error: {error*100:.2f}%")

Amazing result: With just k=10 components (33% reduction), error is only ~8%!

This is how JPEG compression works—keep important components, drop noise.

Matrix Factorization: Recommendation Systems

Netflix, Amazon, Spotify all use matrix factorization to recommend items!

The Recommendation Problem

You have a ratings matrix (users × items) with many missing values. Goal: Predict missing ratings.

Solution: Factor the matrix into User Features × Item Features

# User-item rating matrix (0 = not rated)
ratings = np.array([
    [5, 4, 0, 0, 1, 0],  # User 1
    [4, 0, 0, 0, 1, 0],  # User 2
    [0, 0, 5, 4, 0, 0],  # User 3
    [0, 0, 4, 5, 0, 1],  # User 4
    [0, 1, 0, 0, 4, 5]   # User 5
])

print("Original ratings (5 users × 6 items):")
print(ratings)
print("(0 = not rated)")

# Replace 0s with mean for SVD
mask = ratings > 0
mean_rating = ratings[mask].mean()
ratings_filled = ratings.copy()
ratings_filled[ratings_filled == 0] = mean_rating

print(f"\nMean rating: {mean_rating:.2f}")

# Perform SVD
U, s, Vt = np.linalg.svd(ratings_filled, full_matrices=False)

# Keep only k=2 latent factors
k = 2
U_k = U[:, :k]
s_k = s[:k]
Vt_k = Vt[:k, :]

# Reconstruct ratings
ratings_predicted = U_k @ np.diag(s_k) @ Vt_k

print(f"\nReduced to {k} latent factors")
print("Predicted ratings:")
print(np.round(ratings_predicted, 2))

# Example predictions for unrated items
print("\nPredictions for originally unrated items:")
print(f"User 1, Item 3: {ratings_predicted[0, 2]:.2f}")
print(f"User 3, Item 1: {ratings_predicted[2, 0]:.2f}")

How it works:

U represents user preferences (e.g., likes action vs romance)
Vt represents item characteristics (e.g., is action vs romance)
Their product predicts how much user i will like item j!

Optimization with Linear Algebra: Gradient Descent

Machine learning training uses gradient descent, which relies on linear algebra.

Linear Regression with Gradient Descent

# Generate synthetic data
np.random.seed(42)
n_samples = 20
X = np.linspace(0, 5, n_samples).reshape(-1, 1)
X_with_bias = np.c_[np.ones(n_samples), X]  # Add bias column

# True relationship: y = 3 + 2x + noise
true_params = np.array([3, 2])
y = X_with_bias @ true_params + np.random.randn(n_samples) * 0.5

print(f"Dataset: {n_samples} samples")
print(f"True parameters: intercept={true_params[0]}, slope={true_params[1]}")

# Initialize parameters randomly
params = np.random.randn(2)
learning_rate = 0.01
n_iterations = 100

print(f"\nInitial parameters: {params}")

# Gradient descent
for i in range(n_iterations):
    # Predictions
    predictions = X_with_bias @ params

    # Error
    error = predictions - y

    # Loss (Mean Squared Error)
    loss = np.mean(error**2)

    # Gradient: (2/n) * Xᵀ @ (X @ params - y)
    gradient = (2/n_samples) * X_with_bias.T @ error

    # Update
    params = params - learning_rate * gradient

    if (i+1) % 20 == 0:
        print(f"Iteration {i+1}: Loss = {loss:.6f}, Params = {params}")

print(f"\nFinal parameters: {params}")
print(f"True parameters: {true_params}")
print(f"Close? {np.allclose(params, true_params, atol=0.5)}")

Key point: The gradient calculation uses matrix operations (Xᵀ @ error). This is pure linear algebra!

Putting It All Together: Complete ML Pipeline

Let's use multiple concepts in one example:

# Step 1: Generate high-dimensional data
np.random.seed(123)
n_samples = 50
n_features = 10

# Create correlated features (realistic scenario)
base1 = np.random.randn(n_samples)
base2 = np.random.randn(n_samples)

data = np.zeros((n_samples, n_features))
# Features 0-4 correlated with base1
for i in range(5):
    data[:, i] = base1 + np.random.randn(n_samples) * 0.5
# Features 5-9 correlated with base2
for i in range(5, 10):
    data[:, i] = base2 + np.random.randn(n_samples) * 0.5

print(f"Original data: {data.shape}")
print(f"First sample (10 features): {data[0]}")

# Step 2: Check rank (how many independent features?)
rank = np.linalg.matrix_rank(data)
print(f"\nMatrix rank: {rank} (out of {min(data.shape)} possible)")
print("Features are redundant! PCA will help.")

# Step 3: Apply PCA
mean = np.mean(data, axis=0)
data_centered = data - mean

cov_matrix = np.cov(data_centered.T)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort
sorted_idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_idx]
eigenvectors = eigenvectors[:, sorted_idx]

# Explained variance
explained_var = eigenvalues / np.sum(eigenvalues)
cumulative = np.cumsum(explained_var)

print("\nExplained variance by component:")
for i in range(min(5, len(explained_var))):
    print(f"  PC{i+1}: {explained_var[i]*100:.2f}% (cumulative: {cumulative[i]*100:.2f}%)")

# Step 4: Reduce to 2D for visualization
n_components = 2
pcs = eigenvectors[:, :n_components]
data_reduced = data_centered @ pcs

print(f"\nReduced data: {data_reduced.shape}")
print(f"Preserved {cumulative[n_components-1]*100:.1f}% of variance")
print(f"\nFirst sample in 2D: {data_reduced[0]}")

# Step 5: Use SVD for comparison
U, s, Vt = np.linalg.svd(data_centered, full_matrices=False)
data_svd = U[:, :n_components] @ np.diag(s[:n_components])

print(f"\nUsing SVD instead:")
print(f"First sample: {data_svd[0]}")
print(f"Same result as PCA? {np.allclose(np.abs(data_reduced), np.abs(data_svd))}")

Result: We reduced 10 features to 2 while keeping 88% of information!

Common Applications Summary

Technique	Use Case	Example
Eigenvalues	Find important directions	PCA, stability analysis
SVD	Compress data	Image compression, LSA
PCA	Reduce dimensions	Visualization, feature extraction
Matrix Factorization	Find patterns	Recommender systems
Gradient Descent	Optimize models	Training neural networks

Debugging Tips for Advanced LA

Problem 1: PCA Not Reducing Dimensions Well

# Check eigenvalues distribution
eigenvalues = np.array([5.2, 4.8, 4.5, 0.3, 0.1])
explained = eigenvalues / eigenvalues.sum()

print("Eigenvalues:", eigenvalues)
print("Explained variance:", explained)

# If first few components explain < 80%, your features are truly diverse!
print(f"First 2 components: {explained[:2].sum()*100:.1f}%")

if explained[:2].sum() < 0.8:
    print("Warning: Features are not redundant. May need more components!")

Problem 2: Numerical Instability in SVD

# Very small singular values can cause issues
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9.00001]])
U, s, Vt = np.linalg.svd(A)

print("Singular values:", s)

# Check condition number
condition_number = s[0] / s[-1]
print(f"Condition number: {condition_number:.2e}")

if condition_number > 1e10:
    print("Warning: Ill-conditioned matrix! Consider regularization.")

Problem 3: Choosing Number of Components

def plot_cumulative_variance(eigenvalues):
    """Determine how many components to keep"""
    explained = eigenvalues / eigenvalues.sum()
    cumulative = np.cumsum(explained)

    print("Components needed for:")
    for threshold in [0.8, 0.9, 0.95, 0.99]:
        n_comp = np.argmax(cumulative >= threshold) + 1
        print(f"  {threshold*100:.0f}% variance: {n_comp} components")

    return cumulative

eigenvalues = np.array([10, 5, 3, 1, 0.5, 0.3, 0.1, 0.05])
cumulative = plot_cumulative_variance(eigenvalues)

Quick Reference: Advanced Operations

# Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(A)

# SVD
U, s, Vt = np.linalg.svd(A)

# PCA (manual)
data_centered = data - data.mean(axis=0)
cov_matrix = np.cov(data_centered.T)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Solve linear system (better than inverse)
x = np.linalg.solve(A, b)

# Least squares (overdetermined systems)
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)

# Matrix norm (Frobenius)
norm = np.linalg.norm(A, 'fro')

# Condition number
cond = np.linalg.cond(A)

From Theory to Practice

You've now learned:

✅ Eigenvalues/Eigenvectors: Finding important directions
✅ SVD: The universal decomposition tool
✅ PCA: Practical dimensionality reduction
✅ Matrix Factorization: Recommendation systems
✅ Optimization: Training ML models

These aren't just math exercises—they're the foundation of:

Scikit-learn's PCA (dimensionality reduction)
TensorFlow/PyTorch (neural network operations)
Recommendation engines (Netflix, Spotify)
Image compression (JPEG, PNG)
Natural Language Processing (LSA, word embeddings)

Every time you call model.fit(), linear algebra is working behind the scenes!

Final Challenge: Build Your Own

Try implementing these from scratch:

Image compressor using SVD (keep top k singular values)
Recommender system with matrix factorization
Face recognition using eigenfaces (PCA on face images)
Text analysis with Latent Semantic Analysis (SVD on document-term matrix)

The best way to master linear algebra is to use it in real projects!

You've completed the journey from basic vectors to advanced decompositions! Linear algebra isn't just math—it's the language that makes modern AI possible. Keep practicing, keep building, and you'll see these patterns everywhere in ML! If this series helped you master linear algebra for machine learning, I'd love to hear about your projects! Connect with me on Twitter or LinkedIn.

Support My Work

If this comprehensive guide to eigenvalues, PCA, and matrix decomposition helped you understand advanced linear algebra for machine learning, I'd really appreciate your support! Creating detailed, free educational content with practical examples takes significant time and effort. Your support helps me continue sharing knowledge and creating more helpful tutorials for students like you.

☕ Buy me a coffee - Every contribution, big or small, means the world to me and keeps me motivated to create more content!

Cover image by Robin Glauser on Unsplash

Related Blogs