Ojaswi Athghara | Discovering Computer Vision: Teaching Computers to See

Discovering Computer Vision: Teaching Computers to See

The Moment That Sparked My Curiosity

I was unlocking my phone with Face ID when it hit me: How does my phone recognize my face? How does it know it's me and not a photo of me or my twin (if I had one)?

That simple everyday action—unlocking my phone—suddenly seemed like magic. My phone is seeing my face and understanding it's mine. But how?

That question sent me down a rabbit hole into the world of computer vision, and honestly, I'm still amazed by what I'm discovering. This isn't just about face recognition—computers are learning to see and understand the visual world in ways that seemed like science fiction just years ago.

I'm still pretty new to all of this, but I want to share what I'm learning. If you've ever wondered how computers can see, come explore with me.

What Is Computer Vision, Actually?

The simple answer I keep seeing: Computer vision is teaching computers to understand images and videos the way humans do.

But what does that really mean?

Think about what happens when you look at a photo:

You instantly recognize objects (that's a dog)
You understand the scene (they're at the beach)
You can identify people (that's my friend)
You perceive depth (the dog is closer than the ocean)
You read text (the sign says 'Beach Closed')

You do all this automatically, without thinking. Computer vision is trying to teach computers to do the same thing.

Why Is This Hard?

Here's what I didn't appreciate at first: seeing is hard.

When I look at a photo of a cat, I instantly know it's a cat. But to a computer, that photo is just a grid of numbers—pixel values representing colors. How do you get from numbers to that's definitely a cat?

Consider these challenges:

The same object looks different from different angles
Lighting changes how things look
Objects can be partially hidden
There's background noise and clutter
Things come in different sizes, colors, and shapes

We solve these problems effortlessly. Teaching computers to do it? That's the challenge—and the fascination—of computer vision.

Why Computer Vision Captured My Attention

1. It's Everywhere (Once You Notice)

After learning about computer vision, I started seeing it everywhere:

My phone:

Face ID unlocking
Portrait mode blurring backgrounds
Photo app organizing pictures by people and places
QR code scanning

At home:

Robot vacuums avoiding obstacles
Smart doorbells detecting people
Photo filters on social media

Out in the world:

Self-driving cars seeing pedestrians and traffic signs
Security cameras detecting suspicious activity
Medical imaging diagnosing diseases
Manufacturing quality control checking products

It's literally everywhere, quietly working in the background, making technology smarter and more useful.

2. The Problems Feel Tangible

Unlike some areas of AI that feel abstract, computer vision problems are immediately understandable:

Can you detect all the faces in this photo?
Is this product defective?
What breed is this dog?
Is this person wearing a mask?
Where are the lane lines on this road?

I can look at these problems and instantly understand what success looks like. That makes it easier to learn and more satisfying when things work.

3. The Impact Is Huge

Computer vision is enabling:

Healthcare: Earlier disease detection through medical imaging
Accessibility: Apps that describe the world to blind users
Safety: Cars that can avoid accidents
Agriculture: Drones monitoring crop health
Conservation: Tracking endangered species automatically

We're using computer vision to solve real problems that matter. That's incredibly motivating to learn about.

What Can Computer Vision Do? (Real Examples That Amaze Me)

Image Classification

The foundational task: look at an image and identify what it contains. Systems can now classify images into thousands of categories, identify specific dog breeds, recognize diseases from X-rays, and sort trash.

I tried a pre-trained classifier online with my coffee cup photo—correct. Weird angle—still correct. Bad lighting—still worked. That's when it clicked: this is genuinely impressive technology.

Object Detection

Not just what's in an image but where is it? In a street photo, systems identify and locate cars, pedestrians, traffic lights, and signs. Crucial for self-driving cars, surveillance, retail analytics, and sports analysis.

Watching real-time object detection with boxes drawn around detected objects—seriously cool.

Face Recognition

Identifying who someone is from their face. Systems detect faces, extract unique features, and compare to known faces. Used for phone unlocking, photo organization, security, and finding missing persons.

This is powerful but raises privacy questions. How do we use this responsibly?

Image Segmentation

Understanding every pixel: label each as road, sidewalk, building, sky, person, car, tree. The computer knows exactly which pixels belong to which object. Used for medical imaging, autonomous driving, photo editing, and augmented reality.

Pose Estimation

Understanding human body positions.

Example: From a photo or video, determine where someone's head, shoulders, elbows, hands, hips, knees, and feet are.

Uses:

Fitness apps analyzing your exercise form
Motion capture for animation
Sports performance analysis
Virtual try-on for clothing

I tried a pose estimation demo online with a video of me doing jumping jacks. Watching the skeleton overlay track my movements was... weird but fascinating.

Optical Character Recognition (OCR)

Reading text from images.

Example: Take a photo of a document and convert it to editable text.

Applications:

Digitizing old documents
Translating signs in real-time (Google Translate app)
Reading license plates
Extracting info from receipts

This one feels like genuine magic. Point your phone at text in another language, and it translates it on your screen, overlaying the translation on the image. How?!

My Favorite Aha! Moments So Far

Understanding That Images Are Just Numbers

This blew my mind at first.

An image isn't some mysterious visual thing to a computer. It's literally a grid of numbers.

For a grayscale image:

Each pixel has a value from 0 (black) to 255 (white)
A 100x100 image is just 10,000 numbers

For a color image:

Each pixel has three values: Red, Green, Blue (RGB)
Each value ranges from 0-255
A 100x100 color image is 30,000 numbers

Example:

import numpy as np
from PIL import Image

# Load an image
img = Image.open('photo.jpg')

# Convert to numpy array (just numbers!)
img_array = np.array(img)

print(img_array.shape)  # (height, width, 3) for RGB
print(img_array[0][0])  # First pixel's RGB values
# Output might be: [142, 156, 178] - just three numbers!

Once I understood this, computer vision made more sense. If images are just numbers, then computer vision is really about finding patterns in those numbers.

Filters and Feature Detection

Early in my learning, I discovered image filters—not Instagram filters, but mathematical operations that highlight certain features.

Edge detection finds where brightness changes suddenly (edges of objects):

import cv2
import numpy as np

# Load image
img = cv2.imread('photo.jpg', cv2.IMREAD_GRAYSCALE)

# Apply edge detection
edges = cv2.Canny(img, 100, 200)

# Display result
cv2.imshow('Edges', edges)
cv2.waitKey(0)

Running this on a photo and seeing just the edges—that's when I realized: computers can extract meaningful information from those number grids.

Understanding How Neural Networks See

Modern computer vision mostly uses deep learning—neural networks that learn to recognize patterns.

What fascinated me: early layers detect simple patterns (edges, corners), while deeper layers detect complex patterns (eyes, faces, specific objects).

It's like building understanding layer by layer, from simple to complex. Kind of like how human vision works!

The Scale of Training Data

Modern computer vision models are trained on millions of images.

ImageNet, a famous dataset, has:

Over 14 million images
20,000+ categories
Images of everything from animals to vehicles to household objects

The sheer scale of this data is what makes modern computer vision work so well. The models see so many examples that they learn to generalize.

But also: collecting and labeling 14 million images? That's a massive human effort. Computer vision is built on the work of thousands of people annotating images.

What I'm Finding Challenging

The Math Is... There

To really understand computer vision deeply, you need linear algebra, calculus, probability, and statistics. I'm working through these gradually. You can use computer vision tools without deep math knowledge, but understanding why things work requires it.

Computational Requirements

Training models requires serious computing power—GPUs, hours of training time, and gigabytes of data. For learning, I'm mostly using pre-trained models and fine-tuning them, which is practical for beginners.

So Many Approaches

CNNs, ResNet, YOLO, R-CNN, U-Net, Vision Transformers... there are so many architectures. Answer I'm discovering: depends on the problem! For now, I'm focusing on understanding fundamentals rather than trying to master every architecture.

Computer Vision in My Daily Life (Now That I'm Aware)

Since learning about computer vision, I've been noticing it constantly:

Morning: Face unlock on my phone (face detection + recognition). Work: Video call background blur (image segmentation). Photos: Google Photos auto-organizing by people and places (scene understanding). Shopping: Visual search finding similar products (image similarity).

It's integrated into so much of modern technology that we take it for granted. But once you learn how it works, every instance feels a little bit magical.

Resources I'm Finding Helpful

Since I'm still learning, here are resources helping me understand computer vision:

Beginner-Friendly Tutorials

OpenCV Tutorials: Official tutorials, very practical
PyImageSearch: Excellent blog with practical projects
Coursera's CNN Course by Andrew Ng

Communities

r/computervision on Reddit
Stack Overflow for specific questions

Datasets to Experiment With

Kaggle has tons of image datasets
COCO dataset for object detection
ImageNet for classification

Tools Worth Exploring

OpenCV: The go-to library for computer vision in Python
TensorFlow/PyTorch: Deep learning frameworks
Pre-trained models: Use existing models before training your own

What I Want to Explore Next

I'm thinking about small projects to practice:

Face detection: Can I build something that finds faces in photos?
Object counter: Count specific objects in images
Color detection: Identify dominant colors in photos
Simple classifier: Train a model to recognize a few objects
Motion detection: Detect when something moves in a video

None of these are groundbreaking, but they'll teach me the fundamentals.

Questions I'm Still Figuring Out

Things I don't fully understand yet (and that's okay!): How do CNNs work exactly? What makes one architecture better? How do you deploy models in production? Every answer leads to more questions, and that's what makes this exciting.

Computer Vision vs Human Vision

One thing I'm learning: computer vision isn't trying to replicate human vision exactly. It's solving similar problems but in different ways.

Humans are better at:

Understanding context and common sense
Learning from very few examples
Adapting to new situations quickly
Understanding social and emotional cues

Computers are better at:

Processing thousands of images instantly
Not getting tired or distracted
Precise measurements and counting
Working in challenging conditions (darkness, infrared)
Consistency (no mood swings!)

The goal isn't to beat human vision but to augment it—let computers handle what they're good at, freeing humans for what we're good at.

The Ethical Questions I'm Thinking About

The more I learn, the more I think about implications: Privacy (face recognition everywhere), Bias (models trained on non-diverse data), Surveillance (technology in wrong hands), Accountability (who's responsible when AI makes mistakes?). These aren't just technical problems—they're societal questions we need to address.

Why You Might Find Computer Vision Fascinating Too

If you like visual thinking, want immediate results, are curious about AI, enjoy practical applications, or love interdisciplinary work—computer vision might be worth exploring. It connects coding with visual understanding and produces results you can immediately see and evaluate.

The Best Part About Being a Beginner

Here's what I'm realizing: being new to computer vision is actually exciting because everything is fascinating. Every technique is clever, every application is interesting, every improvement is impressive.

Experts might dismiss something as basic object detection, but to me, the fact that computers can reliably detect objects in images still feels like magic.

I'm enjoying this phase of discovery. Every tutorial teaches me something new. Every demo I try amazes me. Every small project feels like an achievement.

What I've Learned So Far

After a few weeks exploring computer vision:

It's more accessible than it seems: You can start using computer vision tools with basic Python knowledge.
Pre-trained models are your friend: Don't start by training models from scratch. Use existing models to learn and experiment.
Start with simple problems: Detecting faces or identifying objects is a better starting point than trying to build a self-driving car.
Visual feedback is motivating: Seeing your code work on images and videos is immediately satisfying.
The fundamentals matter: Understanding what images are (numbers!) and basic operations (filters, transformations) is more important than memorizing architectures.
Community is helpful: The computer vision community shares code, models, and knowledge generously.

Looking Forward

I'm excited to go deeper into computer vision. Next on my learning list:

Building basic image classification
Understanding convolutional neural networks better
Experimenting with OpenCV more extensively
Trying object detection on my own images
Maybe contributing to open-source computer vision projects

This is just the beginning of my computer vision journey, and I'm looking forward to where curiosity takes me.

If you're curious about computer vision too, start somewhere! Try an online demo, watch a tutorial, experiment with OpenCV. The field is welcoming, the resources are abundant, and the problems are endlessly interesting.

Who knows? Maybe in a few months, I'll understand exactly how Face ID works. And maybe I'll build something that helps computers see and understand the world a little bit better.

Until then, I'm enjoying the journey of teaching myself how to teach computers to see.

Fascinated by computer vision? I'd love to hear what sparked your curiosity! Connect with me on Twitter or LinkedIn and let's learn together.

Support My Work

If this guide helped you with this topic, I'd really appreciate your support! Creating comprehensive, free content like this takes significant time and effort. Your support helps me continue sharing knowledge and creating more helpful resources for developers.

☕ Buy me a coffee - Every contribution, big or small, means the world to me and keeps me motivated to create more content!

Cover image by Ion Fet on Unsplash

Related Blogs