Ojaswi Athghara | Python Testing Tutorial: Pytest and Unittest for Reliable Code

Python Testing Tutorial: Pytest and Unittest for Reliable Code

Why Testing Matters for ML

Catch bugs early (before production!)
Refactor safely (tests verify nothing broke)
Document behavior (tests show how code works)
Team collaboration (others understand your code)

Pytest: Modern Python Testing

Pytest is the most popular testing framework for Python - simple, powerful, and Pythonic.

Basic Test

# calculator.py
def add(a, b):
    return a + b

def multiply(a, b):
    return a * b

def divide(a, b):
    if b == 0:
        raise ValueError("Cannot divide by zero")
    return a / b

# test_calculator.py
def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0
    assert add(0, 0) == 0

def test_multiply():
    assert multiply(3, 4) == 12
    assert multiply(0, 5) == 0
    assert multiply(-2, 3) == -6

def test_divide():
    assert divide(10, 2) == 5
    assert divide(7, 2) == 3.5

def test_divide_by_zero():
    import pytest
    with pytest.raises(ValueError):
        divide(10, 0)

Run tests:

pytest test_calculator.py
# Or just
pytest  # Discovers all test_*.py files

Pytest Advantages

Why pytest over unittest?

Simple assert statements (no self.assertEqual)
Automatic test discovery
Better error messages
Powerful fixtures
Plugin ecosystem
Less boilerplate

Compare the same test:

# unittest (verbose)
import unittest

class TestCalculator(unittest.TestCase):
    def test_add(self):
        self.assertEqual(add(2, 3), 5)
    
    def test_multiply(self):
        self.assertEqual(multiply(3, 4), 12)

# pytest (simple!)
def test_add():
    assert add(2, 3) == 5

def test_multiply():
    assert multiply(3, 4) == 12

Parametrized Tests

Test multiple cases without duplicating code:

import pytest

@pytest.mark.parametrize("a, b, expected", [
    (2, 3, 5),
    (0, 0, 0),
    (-1, 1, 0),
    (100, 200, 300),
])
def test_add_multiple_cases(a, b, expected):
    assert add(a, b) == expected

# ML example: test multiple models
@pytest.mark.parametrize("model_class", [
    RandomForestClassifier,
    GradientBoostingClassifier,
    LogisticRegression
])
def test_model_interface(model_class):
    model = model_class()
    assert hasattr(model, 'fit')
    assert hasattr(model, 'predict')

Test Markers

Organize and selectively run tests:

import pytest

@pytest.mark.slow
def test_train_large_model():
    """This test takes 5 minutes."""
    model.fit(huge_dataset)

@pytest.mark.fast
def test_preprocess():
    """Quick test."""
    data = preprocess([1, 2, 3])
    assert len(data) == 3

@pytest.mark.integration
def test_api_integration():
    """Tests external API."""
    response = call_external_api()
    assert response.status_code == 200

# Run only fast tests
# pytest -m fast

# Skip slow tests
# pytest -m "not slow"

Testing ML Preprocessing

# preprocessing.py
def normalize(data):
    """Min-max normalization."""
    min_val = min(data)
    max_val = max(data)
    return [(x - min_val) / (max_val - min_val) for x in data]

# test_preprocessing.py
def test_normalize():
    data = [0, 50, 100]
    result = normalize(data)
    assert result == [0.0, 0.5, 1.0]

def test_normalize_single_value():
    data = [5]
    # Should handle edge case
    try:
        result = normalize(data)
    except ZeroDivisionError:
        assert True  # Expected behavior

Fixtures: Reusable Test Data

Fixtures provide reusable setup code for tests:

import pytest

@pytest.fixture
def sample_dataset():
    """Create sample data for tests."""
    return {
        'X': [[1, 2], [3, 4], [5, 6]],
        'y': [0, 1, 0]
    }

def test_model_training(sample_dataset):
    model = RandomForest()
    model.fit(sample_dataset['X'], sample_dataset['y'])
    assert model.is_trained == True

def test_model_prediction(sample_dataset):
    model = RandomForest()
    model.fit(sample_dataset['X'], sample_dataset['y'])
    predictions = model.predict([[7, 8]])
    assert len(predictions) == 1

Fixture Scopes:

# Function scope (default) - run before each test
@pytest.fixture
def temp_file():
    f = create_temp_file()
    yield f
    f.close()  # Cleanup after test

# Module scope - run once per module
@pytest.fixture(scope="module")
def database_connection():
    conn = connect_to_db()
    yield conn
    conn.close()

# Session scope - run once per test session
@pytest.fixture(scope="session")
def trained_model():
    model = train_expensive_model()  # Runs once!
    return model

Fixture Chaining:

@pytest.fixture
def database():
    return Database()

@pytest.fixture
def user(database):
    # Use database fixture
    return database.create_user("test_user")

def test_user_operations(user):
    # Has access to user (which used database)
    assert user.name == "test_user"

Testing ML Models

Test Model Interface

def test_model_has_fit_method():
    model = MyMLModel()
    assert hasattr(model, 'fit')
    assert callable(model.fit)

def test_model_has_predict_method():
    model = MyMLModel()
    assert hasattr(model, 'predict')
    assert callable(model.predict)

Test Model Output Shape

def test_model_output_shape():
    model = MyMLModel()
    model.fit(X_train, y_train)
    
    predictions = model.predict(X_test)
    assert predictions.shape == (len(X_test),)

def test_model_probability_range():
    model = MyMLModel()
    model.fit(X_train, y_train)
    
    probabilities = model.predict_proba(X_test)
    assert all(0 <= p <= 1 for p in probabilities)

Test Data Preprocessing

def test_remove_missing_values():
    data = [1, 2, None, 4, None, 6]
    cleaned = remove_nulls(data)
    assert cleaned == [1, 2, 4, 6]
    assert None not in cleaned

def test_feature_scaling():
    data = [10, 20, 30, 40, 50]
    scaled = scale_features(data)
    assert min(scaled) == 0
    assert max(scaled) == 1

Mocking: Test Without Dependencies

Mocking lets you test code without calling real databases, APIs, or external services:

from unittest.mock import Mock, patch, MagicMock

def test_api_call():
    # Mock external API
    with patch('requests.get') as mock_get:
        mock_get.return_value.json.return_value = {'data': [1, 2, 3]}
        
        result = fetch_data_from_api()
        assert result == [1, 2, 3]
        mock_get.assert_called_once()

def test_api_call_with_params():
    with patch('requests.get') as mock_get:
        mock_get.return_value.status_code = 200
        mock_get.return_value.json.return_value = {'users': []}
        
        fetch_users(limit=10)
        
        # Verify it was called with correct params
        mock_get.assert_called_with(
            'https://api.example.com/users',
            params={'limit': 10}
        )

Mocking ML Model Training:

def test_model_saving():
    # Don't actually train model in test!
    with patch.object(MyModel, 'fit') as mock_fit:
        model = MyModel()
        model.fit(X_train, y_train)
        
        # Verify fit was called
        mock_fit.assert_called_once()
        
        # Mock predictions
        with patch.object(model, 'predict', return_value=[1, 0, 1]):
            predictions = model.predict(X_test)
            assert predictions == [1, 0, 1]

Mock Return Values:

# Simple mock
mock = Mock(return_value=42)
assert mock() == 42

# Side effects (different values on each call)
mock = Mock(side_effect=[1, 2, 3])
assert mock() == 1
assert mock() == 2
assert mock() == 3

# Raise exception
mock = Mock(side_effect=ValueError("Invalid input"))
# mock() raises ValueError

Test-Driven Development (TDD)

Write test first, then code!

# 1. Write failing test
def test_accuracy_calculation():
    predictions = [1, 0, 1, 1]
    actual = [1, 0, 0, 1]
    accuracy = calculate_accuracy(predictions, actual)
    assert accuracy == 0.75

# 2. Write minimal code to pass
def calculate_accuracy(predictions, actual):
    correct = sum(p == a for p, a in zip(predictions, actual))
    return correct / len(predictions)

# 3. Refactor (improve code while tests still pass)

Unittest: Standard Library Testing

Python's built-in testing framework:

import unittest

class TestCalculator(unittest.TestCase):
    def setUp(self):
        """Run before each test."""
        self.calc = Calculator()
    
    def tearDown(self):
        """Run after each test."""
        self.calc = None
    
    def test_add(self):
        result = self.calc.add(2, 3)
        self.assertEqual(result, 5)
    
    def test_divide_by_zero(self):
        with self.assertRaises(ValueError):
            self.calc.divide(10, 0)
    
    def test_list_contains(self):
        result = [1, 2, 3]
        self.assertIn(2, result)
        self.assertNotIn(4, result)

if __name__ == '__main__':
    unittest.main()

Unittest Assertions:

assertEqual(a, b) - Check a == b
assertTrue(x) - Check x is True
assertIsNone(x) - Check x is None
assertIn(a, b) - Check a in b
assertRaises(Exception) - Check exception raised
assertAlmostEqual(a, b) - Check floats nearly equal

Testing Async Code

import pytest
import asyncio

async def fetch_data_async(url):
    await asyncio.sleep(1)
    return {"data": [1, 2, 3]}

@pytest.mark.asyncio
async def test_async_function():
    result = await fetch_data_async("http://api.example.com")
    assert result['data'] == [1, 2, 3]

# Or use pytest-asyncio plugin
@pytest.fixture
async def async_client():
    client = AsyncAPIClient()
    await client.connect()
    yield client
    await client.disconnect()

Common Testing Mistakes

Mistake 1: Testing Implementation Details

# Bad - tests internal implementation
def test_sort_uses_quicksort():
    data = [3, 1, 2]
    sorter = Sorter()
    sorter.sort(data)
    assert sorter._algorithm == 'quicksort'  # Don't test this!

# Good - test behavior
def test_sort_orders_data():
    data = [3, 1, 2]
    result = sort(data)
    assert result == [1, 2, 3]  # Test what it does

Mistake 2: Interdependent Tests

# Bad - tests depend on each other
class TestUser:
    user = None
    
    def test_create_user(self):
        self.user = create_user("alice")
        assert self.user.name == "alice"
    
    def test_update_user(self):
        # Depends on test_create_user running first!
        self.user.update(email="new@example.com")

# Good - each test is independent
class TestUser:
    def test_create_user(self):
        user = create_user("alice")
        assert user.name == "alice"
    
    def test_update_user(self):
        user = create_user("alice")  # Create own test data
        user.update(email="new@example.com")
        assert user.email == "new@example.com"

Mistake 3: No Assertions

# Bad - no assertion!
def test_process_data():
    process_data([1, 2, 3])  # Just calling the function

# Good - verify behavior
def test_process_data():
    result = process_data([1, 2, 3])
    assert len(result) == 3
    assert all(x > 0 for x in result)

Test Coverage

# Install coverage
pip install pytest-cov

# Run with coverage
pytest --cov=myproject tests/

# Generate HTML report
pytest --cov=myproject --cov-report=html

# Aim for 80%+ coverage, but quality > quantity!

CI/CD Integration

GitHub Actions:

# .github/workflows/test.yml
name: Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.9'
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-cov
      - name: Run tests
        run: pytest --cov=myproject tests/

Best Practices

1. Test Edge Cases

def test_normalize_edge_cases():
    # Empty list
    assert normalize([]) == []
    
    # Single value
    assert normalize([5]) == [5]
    
    # All same values
    assert normalize([3, 3, 3]) == [3, 3, 3]
    
    # Negative numbers
    result = normalize([-10, 0, 10])
    assert result == [-1, 0, 1]

2. Use Descriptive Test Names

# Bad
def test_model():
    pass

# Good
def test_model_predicts_correct_class_for_positive_examples():
    pass

# Alternative naming convention
def test_GIVEN_trained_model_WHEN_predicting_THEN_returns_probabilities():
    pass

3. Arrange-Act-Assert Pattern

def test_user_registration():
    # Arrange - set up test data
    email = "test@example.com"
    password = "secure123"
    
    # Act - perform the action
    user = register_user(email, password)
    
    # Assert - verify result
    assert user.email == email
    assert user.is_active == True

4. Test Behavior, Not Implementation

# Bad - coupled to implementation
def test_cache_uses_dictionary():
    cache = Cache()
    assert isinstance(cache._storage, dict)

# Good - tests behavior
def test_cache_stores_and_retrieves_values():
    cache = Cache()
    cache.set('key', 'value')
    assert cache.get('key') == 'value'

5. Keep Tests Fast

# Slow tests discourage running them
@pytest.mark.slow  # Mark slow tests
def test_train_production_model():
    model.fit(full_dataset)  # Takes 10 minutes

# Fast tests run frequently
def test_model_interface():
    model = Model()
    assert hasattr(model, 'fit')  # Instant

Running Tests

# Run all tests
pytest

# Run specific file
pytest test_models.py

# Run specific test
pytest test_models.py::test_model_training

# Run with coverage
pytest --cov=myproject

# Run verbose
pytest -v

# Stop on first failure
pytest -x

# Run last failed tests
pytest --lf

# Run tests matching pattern
pytest -k "test_model"

# Show print statements
pytest -s

Key Takeaways

pytest is strongly preferred - simpler syntax, much better features than unittest
Write tests first - TDD methodology helps design better code
Use fixtures - DRY principle for test data setup
Mock external dependencies - tests should be fast and isolated
Test behavior, not implementation - tests should survive refactoring
Aim for 80%+ coverage - but quality matters more than quantity
Keep tests fast - developers run fast tests more often
Test edge cases - empty inputs, None values, negative numbers, and large values
Use CI/CD - automate testing on every commit
Descriptive test names - tests document your code

Testing transforms unreliable code into production-ready software. Start small—even a few tests are better than none. As your test suite grows, so does your confidence in making changes!

Remember: The best time to write tests is before bugs reach production. The second best time is now. Every test you write is an investment in your code's future reliability, maintainability, and your own peace of mind when deploying to production.