Eigenvalues, PCA, and Matrix Decomposition: Advanced Linear Algebra for ML
Master eigenvalues, eigenvectors, SVD, and PCA with Python. Learn dimensionality reduction and see real ML applications of advanced linear algebra concepts

The Moment Eigenvalues Clicked for Me
I was staring at a dataset with 100 features wondering: "How do I visualize this?" Then I learned about PCA (Principal Component Analysis).
PCA magically reduced my 100 features to 2, and I could finally plot my data. But how? The secret was eigenvalues and eigenvectors—concepts that seemed abstract until I saw them compress data while keeping the important patterns!
This guide will demystify these advanced topics and show you how they power real ML algorithms.
Eigenvalues and Eigenvectors: The Matrix's DNA
What Are They?
For a matrix A and vector v:
- Eigenvector: A direction that doesn't change when A is applied (only scales)
- Eigenvalue (λ): How much it scales
Equation: Av = λv
Think of it like this: Most vectors get rotated and scaled by a matrix. But eigenvectors? They only get scaled!
import numpy as np
# Simple 2×2 matrix
A = np.array([
[4, -2],
[1, 1]
])
# Find eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Matrix A:")
print(A)
print(f"\nEigenvalues: {eigenvalues}")
print(f"\nEigenvectors (as columns):")
print(eigenvectors)
Verifying the Eigenvalue Equation
Let's check: Does Av = λv hold?
# Take first eigenvector
v1 = eigenvectors[:, 0]
lambda1 = eigenvalues[0]
# Left side: A @ v
left_side = A @ v1
# Right side: λ * v
right_side = lambda1 * v1
print(f"Eigenvector v1: {v1}")
print(f"Eigenvalue λ1: {lambda1:.4f}")
print(f"\nA @ v1 = {left_side}")
print(f"λ1 * v1 = {right_side}")
print(f"Equal? {np.allclose(left_side, right_side)}")
Output shows they match! This proves v1 is indeed an eigenvector.
Why Eigenvalues Matter in ML
Eigenvalues reveal:
- Variance direction: Where data spreads most (PCA)
- System stability: Whether iterative algorithms converge
- Data patterns: Hidden structure in your dataset
Larger eigenvalue = more important direction!
Eigenvalue Decomposition: Breaking Down Matrices
The Decomposition Formula
For a square matrix A with n independent eigenvectors:
A = QΛQ⁻¹
Where:
- Q: Matrix with eigenvectors as columns
- Λ: Diagonal matrix with eigenvalues
- Q⁻¹: Inverse of Q
# Eigenvalue decomposition
A = np.array([[3, 1], [1, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
# Build the components
Q = eigenvectors
Lambda = np.diag(eigenvalues)
Q_inv = np.linalg.inv(Q)
print("Original matrix A:")
print(A)
print("\nQ (eigenvectors):")
print(Q)
print("\nΛ (eigenvalues on diagonal):")
print(Lambda)
print("\nQ⁻¹:")
print(Q_inv)
# Reconstruct: A = Q @ Λ @ Q⁻¹
A_reconstructed = Q @ Lambda @ Q_inv
print("\nReconstructed A:")
print(A_reconstructed)
print(f"Matches original? {np.allclose(A, A_reconstructed)}")
Why This Matters: Decomposition reveals the "true nature" of a transformation!
Singular Value Decomposition (SVD): The Ultimate Tool
SVD Works on ANY Matrix!
Unlike eigenvalue decomposition (only square matrices), SVD works on any matrix (even non-square).
A = UΣVᵀ
Where:
- U: Left singular vectors (m×m)
- Σ: Singular values diagonal (m×n)
- Vᵀ: Right singular vectors transposed (n×n)
# SVD on non-square matrix
A = np.array([
[3, 1, 1],
[1, 3, 1]
])
print(f"Original matrix (2×3):")
print(A)
# Perform SVD
U, singular_values, Vt = np.linalg.svd(A)
print(f"\nU shape: {U.shape}")
print(f"Singular values: {singular_values}")
print(f"Vt shape: {Vt.shape}")
# Create Sigma matrix (same shape as A)
Sigma = np.zeros_like(A, dtype=float)
np.fill_diagonal(Sigma, singular_values)
print(f"\nΣ matrix:")
print(Sigma)
# Reconstruct: A = U @ Σ @ Vᵀ
A_reconstructed = U @ Sigma @ Vt
print(f"\nReconstructed A:")
print(A_reconstructed)
print(f"Matches? {np.allclose(A, A_reconstructed)}")
What Are Singular Values?
Singular values tell you how important each component is:
- Largest singular value: Most important pattern
- Smallest singular value: Least important (often noise)
Key insight: You can often drop small singular values and keep 95%+ of information!
PCA: Dimensionality Reduction in Action
What is PCA?
Principal Component Analysis finds the directions (principal components) of maximum variance in your data. It's like finding the "best view" of your data.
Steps:
- Center the data (subtract mean)
- Compute covariance matrix
- Find eigenvalues & eigenvectors
- Sort by eigenvalues (largest first)
- Project data onto top k eigenvectors
Implementing PCA from Scratch
Let's reduce 3D data to 2D!
# Create sample dataset (5 samples, 3 features)
data = np.array([
[2.5, 2.4, 1.0],
[0.5, 0.7, 0.2],
[2.2, 2.9, 1.5],
[1.9, 2.2, 1.2],
[3.1, 3.0, 1.8]
])
print("Original data (5 samples × 3 features):")
print(data)
print(f"Shape: {data.shape}")
# Step 1: Center the data
mean = np.mean(data, axis=0)
data_centered = data - mean
print(f"\nMean: {mean}")
print("Centered data:")
print(data_centered)
# Step 2: Covariance matrix
cov_matrix = np.cov(data_centered.T)
print(f"\nCovariance matrix:")
print(cov_matrix)
# Step 3: Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# Step 4: Sort by eigenvalues (descending)
sorted_idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_idx]
eigenvectors = eigenvectors[:, sorted_idx]
print(f"\nEigenvalues (sorted): {eigenvalues}")
print("Principal components (eigenvectors):")
print(eigenvectors)
# Step 5: Calculate explained variance
explained_variance = eigenvalues / np.sum(eigenvalues)
print("\nExplained variance ratio:")
for i, var in enumerate(explained_variance):
print(f" PC{i+1}: {var*100:.2f}%")
cumulative = np.cumsum(explained_variance)
print(f"Cumulative: {cumulative}")
# Step 6: Project onto first 2 principal components
n_components = 2
principal_components = eigenvectors[:, :n_components]
data_pca = data_centered @ principal_components
print(f"\nReduced data (5 samples × 2 components):")
print(data_pca)
print(f"Shape: {data_pca.shape}")
print(f"\nPreserved {cumulative[n_components-1]*100:.1f}% of variance!")
Understanding PCA Results
Explained variance tells you how much information each component captures:
- PC1: 96% → Most data variation in this direction
- PC2: 3% → Second most variation
- PC3: 1% → Minimal variation (can drop!)
By keeping just PC1 and PC2, we reduced dimensions from 3 to 2 while keeping ~99% of information!
Real-World Application: Image Compression with SVD
Images are matrices of pixels. SVD can compress them!
# Create a simple "image" (30×30 matrix)
x = np.linspace(0, 10, 30)
y = np.linspace(0, 10, 30)
X, Y = np.meshgrid(x, y)
# Generate pattern
image = np.sin(X) * np.cos(Y) + 0.5 * np.sin(2*X)
image += np.random.randn(30, 30) * 0.1 # Add noise
print(f"Original image shape: {image.shape}")
print(f"Original storage: {image.size} values")
# Perform SVD
U, s, Vt = np.linalg.svd(image)
print(f"\nNumber of singular values: {len(s)}")
print(f"Top 5 singular values: {s[:5]}")
# Compress by keeping only k largest singular values
def compress_image(U, s, Vt, k):
"""Reconstruct image using only k components"""
U_k = U[:, :k]
s_k = s[:k]
Vt_k = Vt[:k, :]
return U_k @ np.diag(s_k) @ Vt_k
# Try different compression levels
for k in [2, 5, 10, 15]:
compressed = compress_image(U, s, Vt, k)
# Calculate compression ratio and error
original_storage = image.shape[0] * image.shape[1]
compressed_storage = k * (image.shape[0] + image.shape[1] + 1)
ratio = (1 - compressed_storage/original_storage) * 100
error = np.linalg.norm(image - compressed) / np.linalg.norm(image)
print(f"\nk={k}:")
print(f" Storage: {compressed_storage} values ({ratio:.1f}% reduction)")
print(f" Error: {error*100:.2f}%")
Amazing result: With just k=10 components (33% reduction), error is only ~8%!
This is how JPEG compression works—keep important components, drop noise.
Matrix Factorization: Recommendation Systems
Netflix, Amazon, Spotify all use matrix factorization to recommend items!
The Recommendation Problem
You have a ratings matrix (users × items) with many missing values. Goal: Predict missing ratings.
Solution: Factor the matrix into User Features × Item Features
# User-item rating matrix (0 = not rated)
ratings = np.array([
[5, 4, 0, 0, 1, 0], # User 1
[4, 0, 0, 0, 1, 0], # User 2
[0, 0, 5, 4, 0, 0], # User 3
[0, 0, 4, 5, 0, 1], # User 4
[0, 1, 0, 0, 4, 5] # User 5
])
print("Original ratings (5 users × 6 items):")
print(ratings)
print("(0 = not rated)")
# Replace 0s with mean for SVD
mask = ratings > 0
mean_rating = ratings[mask].mean()
ratings_filled = ratings.copy()
ratings_filled[ratings_filled == 0] = mean_rating
print(f"\nMean rating: {mean_rating:.2f}")
# Perform SVD
U, s, Vt = np.linalg.svd(ratings_filled, full_matrices=False)
# Keep only k=2 latent factors
k = 2
U_k = U[:, :k]
s_k = s[:k]
Vt_k = Vt[:k, :]
# Reconstruct ratings
ratings_predicted = U_k @ np.diag(s_k) @ Vt_k
print(f"\nReduced to {k} latent factors")
print("Predicted ratings:")
print(np.round(ratings_predicted, 2))
# Example predictions for unrated items
print("\nPredictions for originally unrated items:")
print(f"User 1, Item 3: {ratings_predicted[0, 2]:.2f}")
print(f"User 3, Item 1: {ratings_predicted[2, 0]:.2f}")
How it works:
- U represents user preferences (e.g., likes action vs romance)
- Vt represents item characteristics (e.g., is action vs romance)
- Their product predicts how much user i will like item j!
Optimization with Linear Algebra: Gradient Descent
Machine learning training uses gradient descent, which relies on linear algebra.
Linear Regression with Gradient Descent
# Generate synthetic data
np.random.seed(42)
n_samples = 20
X = np.linspace(0, 5, n_samples).reshape(-1, 1)
X_with_bias = np.c_[np.ones(n_samples), X] # Add bias column
# True relationship: y = 3 + 2x + noise
true_params = np.array([3, 2])
y = X_with_bias @ true_params + np.random.randn(n_samples) * 0.5
print(f"Dataset: {n_samples} samples")
print(f"True parameters: intercept={true_params[0]}, slope={true_params[1]}")
# Initialize parameters randomly
params = np.random.randn(2)
learning_rate = 0.01
n_iterations = 100
print(f"\nInitial parameters: {params}")
# Gradient descent
for i in range(n_iterations):
# Predictions
predictions = X_with_bias @ params
# Error
error = predictions - y
# Loss (Mean Squared Error)
loss = np.mean(error**2)
# Gradient: (2/n) * Xᵀ @ (X @ params - y)
gradient = (2/n_samples) * X_with_bias.T @ error
# Update
params = params - learning_rate * gradient
if (i+1) % 20 == 0:
print(f"Iteration {i+1}: Loss = {loss:.6f}, Params = {params}")
print(f"\nFinal parameters: {params}")
print(f"True parameters: {true_params}")
print(f"Close? {np.allclose(params, true_params, atol=0.5)}")
Key point: The gradient calculation uses matrix operations (Xᵀ @ error). This is pure linear algebra!
Putting It All Together: Complete ML Pipeline
Let's use multiple concepts in one example:
# Step 1: Generate high-dimensional data
np.random.seed(123)
n_samples = 50
n_features = 10
# Create correlated features (realistic scenario)
base1 = np.random.randn(n_samples)
base2 = np.random.randn(n_samples)
data = np.zeros((n_samples, n_features))
# Features 0-4 correlated with base1
for i in range(5):
data[:, i] = base1 + np.random.randn(n_samples) * 0.5
# Features 5-9 correlated with base2
for i in range(5, 10):
data[:, i] = base2 + np.random.randn(n_samples) * 0.5
print(f"Original data: {data.shape}")
print(f"First sample (10 features): {data[0]}")
# Step 2: Check rank (how many independent features?)
rank = np.linalg.matrix_rank(data)
print(f"\nMatrix rank: {rank} (out of {min(data.shape)} possible)")
print("Features are redundant! PCA will help.")
# Step 3: Apply PCA
mean = np.mean(data, axis=0)
data_centered = data - mean
cov_matrix = np.cov(data_centered.T)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# Sort
sorted_idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_idx]
eigenvectors = eigenvectors[:, sorted_idx]
# Explained variance
explained_var = eigenvalues / np.sum(eigenvalues)
cumulative = np.cumsum(explained_var)
print("\nExplained variance by component:")
for i in range(min(5, len(explained_var))):
print(f" PC{i+1}: {explained_var[i]*100:.2f}% (cumulative: {cumulative[i]*100:.2f}%)")
# Step 4: Reduce to 2D for visualization
n_components = 2
pcs = eigenvectors[:, :n_components]
data_reduced = data_centered @ pcs
print(f"\nReduced data: {data_reduced.shape}")
print(f"Preserved {cumulative[n_components-1]*100:.1f}% of variance")
print(f"\nFirst sample in 2D: {data_reduced[0]}")
# Step 5: Use SVD for comparison
U, s, Vt = np.linalg.svd(data_centered, full_matrices=False)
data_svd = U[:, :n_components] @ np.diag(s[:n_components])
print(f"\nUsing SVD instead:")
print(f"First sample: {data_svd[0]}")
print(f"Same result as PCA? {np.allclose(np.abs(data_reduced), np.abs(data_svd))}")
Result: We reduced 10 features to 2 while keeping 88% of information!
Common Applications Summary
| Technique | Use Case | Example |
|---|---|---|
| Eigenvalues | Find important directions | PCA, stability analysis |
| SVD | Compress data | Image compression, LSA |
| PCA | Reduce dimensions | Visualization, feature extraction |
| Matrix Factorization | Find patterns | Recommender systems |
| Gradient Descent | Optimize models | Training neural networks |
Debugging Tips for Advanced LA
Problem 1: PCA Not Reducing Dimensions Well
# Check eigenvalues distribution
eigenvalues = np.array([5.2, 4.8, 4.5, 0.3, 0.1])
explained = eigenvalues / eigenvalues.sum()
print("Eigenvalues:", eigenvalues)
print("Explained variance:", explained)
# If first few components explain < 80%, your features are truly diverse!
print(f"First 2 components: {explained[:2].sum()*100:.1f}%")
if explained[:2].sum() < 0.8:
print("Warning: Features are not redundant. May need more components!")
Problem 2: Numerical Instability in SVD
# Very small singular values can cause issues
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9.00001]])
U, s, Vt = np.linalg.svd(A)
print("Singular values:", s)
# Check condition number
condition_number = s[0] / s[-1]
print(f"Condition number: {condition_number:.2e}")
if condition_number > 1e10:
print("Warning: Ill-conditioned matrix! Consider regularization.")
Problem 3: Choosing Number of Components
def plot_cumulative_variance(eigenvalues):
"""Determine how many components to keep"""
explained = eigenvalues / eigenvalues.sum()
cumulative = np.cumsum(explained)
print("Components needed for:")
for threshold in [0.8, 0.9, 0.95, 0.99]:
n_comp = np.argmax(cumulative >= threshold) + 1
print(f" {threshold*100:.0f}% variance: {n_comp} components")
return cumulative
eigenvalues = np.array([10, 5, 3, 1, 0.5, 0.3, 0.1, 0.05])
cumulative = plot_cumulative_variance(eigenvalues)
Quick Reference: Advanced Operations
# Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(A)
# SVD
U, s, Vt = np.linalg.svd(A)
# PCA (manual)
data_centered = data - data.mean(axis=0)
cov_matrix = np.cov(data_centered.T)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# Solve linear system (better than inverse)
x = np.linalg.solve(A, b)
# Least squares (overdetermined systems)
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
# Matrix norm (Frobenius)
norm = np.linalg.norm(A, 'fro')
# Condition number
cond = np.linalg.cond(A)
From Theory to Practice
You've now learned:
- ✅ Eigenvalues/Eigenvectors: Finding important directions
- ✅ SVD: The universal decomposition tool
- ✅ PCA: Practical dimensionality reduction
- ✅ Matrix Factorization: Recommendation systems
- ✅ Optimization: Training ML models
These aren't just math exercises—they're the foundation of:
- Scikit-learn's PCA (dimensionality reduction)
- TensorFlow/PyTorch (neural network operations)
- Recommendation engines (Netflix, Spotify)
- Image compression (JPEG, PNG)
- Natural Language Processing (LSA, word embeddings)
Every time you call model.fit(), linear algebra is working behind the scenes!
Final Challenge: Build Your Own
Try implementing these from scratch:
- Image compressor using SVD (keep top k singular values)
- Recommender system with matrix factorization
- Face recognition using eigenfaces (PCA on face images)
- Text analysis with Latent Semantic Analysis (SVD on document-term matrix)
The best way to master linear algebra is to use it in real projects!
You've completed the journey from basic vectors to advanced decompositions! Linear algebra isn't just math—it's the language that makes modern AI possible. Keep practicing, keep building, and you'll see these patterns everywhere in ML! If this series helped you master linear algebra for machine learning, I'd love to hear about your projects! Connect with me on Twitter or LinkedIn.
Support My Work
If this comprehensive guide to eigenvalues, PCA, and matrix decomposition helped you understand advanced linear algebra for machine learning, I'd really appreciate your support! Creating detailed, free educational content with practical examples takes significant time and effort. Your support helps me continue sharing knowledge and creating more helpful tutorials for students like you.
☕ Buy me a coffee - Every contribution, big or small, means the world to me and keeps me motivated to create more content!
Cover image by Robin Glauser on Unsplash