Ojaswi Athghara | Python Modules and Packages: Organize Code Like a Pro

Python Modules and Packages: Organize Code Like a Pro

When My Project Became a Mess

My ML project was one giant file—3000 lines of chaos:

my_project.py  # Everything in one file! 😱

Then I learned about modules and packages:

my_project/
├── __init__.py
├── models/
│   ├── __init__.py
│   ├── random_forest.py
│   └── neural_net.py
├── preprocessing/
│   ├── __init__.py
│   └── scalers.py
└── utils/
    ├── __init__.py
    └── metrics.py

Organization = Professional code!

Modules: Single Files

A module is simply a Python file containing definitions and statements. Every .py file you create is automatically a module.

Creating Your First Module

# math_utils.py (module)
def add(a, b):
    """Add two numbers."""
    return a + b

def multiply(a, b):
    """Multiply two numbers."""
    return a * b

def power(base, exponent):
    """Calculate base raised to exponent."""
    return base ** exponent

PI = 3.14159
E = 2.71828

class Calculator:
    """Simple calculator class."""
    def __init__(self):
        self.history = []
    
    def calculate(self, operation, a, b):
        result = operation(a, b)
        self.history.append(f"{operation.__name__}({a}, {b}) = {result}")
        return result

Using Modules

# main.py
import math_utils

# Access functions
result = math_utils.add(5, 3)
print(result)  # 8

# Access constants
print(math_utils.PI)  # 3.14159

# Access classes
calc = math_utils.Calculator()
calc.calculate(math_utils.add, 10, 5)

Module Attributes

Every module has special attributes:

# math_utils.py
print(__name__)  # Module name
print(__file__)  # File path
print(__doc__)   # Module docstring

def main():
    print("Running as script")

# Run only when executed directly
if __name__ == "__main__":
    main()

When you import: __name__ is "math_utils".
When you run directly: __name__ is "__main__".

Packages: Directories with init.py

A package is a directory containing __init__.py. This special file tells Python "this directory is a package."

Basic Package Structure

ml_toolkit/
├── __init__.py          # Makes it a package
├── models.py
├── preprocessing.py
└── utils.py

The init.py File

__init__.py can be empty, but it's powerful when used properly:

# ml_toolkit/__init__.py
"""
ML Toolkit - Machine Learning Utilities
"""

# Import key components for convenient access
from .models import RandomForest, NeuralNetwork
from .preprocessing import scale_data, normalize
from .utils import save_model, load_model

# Define public API
__all__ = ['RandomForest', 'NeuralNetwork', 'scale_data', 'normalize']

# Package metadata
__version__ = '1.0.0'
__author__ = 'Your Name'

# Initialize package-level variables
DEFAULT_CONFIG = {
    'random_state': 42,
    'verbose': True
}

print(f"Loaded ML Toolkit v{__version__}")

Now users can import directly:

from ml_toolkit import RandomForest, scale_data
# Instead of: from ml_toolkit.models import RandomForest

Nested Packages

Packages can contain subpackages:

ml_toolkit/
├── __init__.py
├── models/
│   ├── __init__.py
│   ├── classification/
│   │   ├── __init__.py
│   │   ├── random_forest.py
│   │   └── svm.py
│   └── regression/
│       ├── __init__.py
│       └── linear.py
├── preprocessing/
│   ├── __init__.py
│   ├── scalers.py
│   └── encoders.py
└── utils/
    ├── __init__.py
    └── metrics.py

# Access nested modules
from ml_toolkit.models.classification import random_forest
from ml_toolkit.preprocessing.scalers import StandardScaler

Import Styles

Different Ways to Import

# 1. Import entire module
import numpy
result = numpy.array([1, 2, 3])

# 2. Import specific function/class
from numpy import array
result = array([1, 2, 3])

# 3. Import with alias (most common for popular libraries)
import pandas as pd
df = pd.DataFrame()

# 4. Import multiple items
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

# 5. Import submodule
from sklearn.ensemble import RandomForestClassifier

# 6. Import everything (AVOID - pollutes namespace!)
from numpy import *  # Don't do this!

When to Use Each Style

Use full import when you need many functions from a module:

import math
x = math.sin(math.pi / 2)
y = math.cos(0)
z = math.sqrt(16)

Use specific imports when you only need a few items:

from math import sin, cos, pi
x = sin(pi / 2)
y = cos(0)

Use aliases for commonly used libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

Why Avoid `from module import *`

# Bad - unclear where functions come from
from math import *
from numpy import *

result = sqrt(16)  # math.sqrt or numpy.sqrt?

# Good - explicit is better
import math
import numpy as np

result = math.sqrt(16)
array_result = np.sqrt(np.array([16, 25, 36]))

ML Project Structure

ml_project/
├── __init__.py
├── data/
│   ├── __init__.py
│   ├── loader.py
│   └── preprocessor.py
├── models/
│   ├── __init__.py
│   ├── base.py
│   ├── random_forest.py
│   └── neural_net.py
├── training/
│   ├── __init__.py
│   └── trainer.py
├── evaluation/
│   ├── __init__.py
│   └── metrics.py
└── utils/
    ├── __init__.py
    └── helpers.py

Relative vs Absolute Imports

Absolute Imports (Recommended)

Absolute imports specify the full path from the project root:

# In ml_project/models/neural_net.py
from ml_project.models.random_forest import RandomForest
from ml_project.data.loader import load_data
from ml_project.utils.metrics import accuracy_score

Advantages:

Clear and explicit
Works from anywhere
Easier to understand
Better for large projects

Relative Imports

Relative imports use dots to navigate the package hierarchy:

# In ml_project/models/neural_net.py

# Import from same directory
from .random_forest import RandomForest

# Import from parent directory
from ..data.loader import load_data

# Import from sibling directory
from ..utils.metrics import accuracy_score

# Go up two levels
from ...config import settings

Dot notation:

. = current package
.. = parent package
... = grandparent package

When to Use Each

Use absolute imports:

In entry point scripts
When clarity is important
In large, complex projects
When module might move later

Use relative imports:

Within a tightly coupled package
When you want package portability
For internal package structure

Common Relative Import Mistake

# This FAILS in script run directly
# Can only use relative imports inside packages

# wrong.py (run as script)
from .utils import helper  # Error: attempted relative import with no known parent package

Module Search Path

Python searches for modules in specific locations. Understanding this prevents import errors.

How Python Finds Modules

import sys
print(sys.path)
# Output:
# [
#     '/current/directory',
#     '/usr/lib/python3.9',
#     '/usr/lib/python3.9/site-packages',
#     ...
# ]

Search order:

Current directory
PYTHONPATH environment variable directories
Standard library directories
Site-packages (installed packages)

Adding Custom Paths

# Method 1: Modify sys.path
import sys
sys.path.append('/path/to/my/modules')

# Method 2: Use PYTHONPATH environment variable
# In terminal:
# export PYTHONPATH="/path/to/my/modules:$PYTHONPATH"

# Method 3: Use .pth file in site-packages
# Create mymodules.pth with path to your modules

Practical Example

# Project structure
my_project/
├── main.py
├── config.py
└── src/
    ├── __init__.py
    ├── models.py
    └── utils.py

# main.py
import sys
from pathlib import Path

# Add src to path
src_path = Path(__file__).parent / 'src'
sys.path.insert(0, str(src_path))

# Now can import from src
from models import MyModel
from utils import helper_function

Common Import Errors and Solutions

Error 1: ModuleNotFoundError

import my_module  # ModuleNotFoundError: No module named 'my_module'

Solutions:

Check if file exists in current directory
Verify __init__.py exists in package directories
Check sys.path includes the module's directory
Install package if it's external: pip install my_module

Error 2: Circular Imports

# module_a.py
from module_b import function_b

def function_a():
    return function_b()

# module_b.py  
from module_a import function_a  # Circular import!

def function_b():
    return function_a()

Solution: Restructure code or use import inside function:

# module_b.py
def function_b():
    from module_a import function_a  # Import inside function
    return function_a()

Error 3: Relative Import Beyond Top-Level

# Attempting to go beyond package root
from ...something import anything  # ValueError: attempted relative import beyond top-level package

Solution: Use absolute imports or restructure package hierarchy.

Error 4: Name Conflicts

# Shadowing standard library
import json  # Our json.py file!

data = json.loads('{"key": "value"}')  # Error: module has no attribute 'loads'

Solution: Rename your file to avoid conflicts with standard library or installed packages.

Best Practices

1. Use Meaningful Names

# Bad
import utils
from helpers import func

# Good
import data_preprocessing_utils
from validation_helpers import validate_email

2. Keep init.py Clean

# Good __init__.py
"""Package for data processing utilities."""

from .preprocessing import clean_data, normalize
from .validation import validate_input

__all__ = ['clean_data', 'normalize', 'validate_input']
__version__ = '1.0.0'

Avoid complex logic in __init__.py - it runs on every import!

3. Organize by Functionality

# Good structure
ml_project/
├── data/           # Data-related modules
├── models/         # Model implementations
├── training/       # Training logic
├── evaluation/     # Evaluation metrics
└── utils/          # General utilities

4. Use all to Define Public API

# models.py
__all__ = ['RandomForest', 'NeuralNetwork']  # Public API

class RandomForest:
    pass

class NeuralNetwork:
    pass

class _PrivateHelper:  # Not exported
    pass

5. Document Your Modules

"""
Module: data_preprocessing

This module provides utilities for preprocessing raw data including:
- Data cleaning
- Normalization
- Feature extraction

Example:
    from data_preprocessing import clean_data
    
    cleaned = clean_data(raw_data)
"""

6. Avoid Circular Dependencies

Organize imports to create a dependency tree, not a web:

# Good architecture
config → utils → models → training → main

# Bad architecture (circular)
models ↔ training ↔ utils

Real-World Example: Complete ML Project

Let's build a proper structure for a machine learning project:

ml_classification/
├── __init__.py
├── setup.py              # For installation
├── data/
│   ├── __init__.py
│   ├── loader.py         # Data loading
│   └── preprocessor.py   # Data preprocessing
├── models/
│   ├── __init__.py
│   ├── base.py          # Base model class
│   ├── random_forest.py
│   └── neural_net.py
├── training/
│   ├── __init__.py
│   ├── trainer.py
│   └── callbacks.py
├── evaluation/
│   ├── __init__.py
│   └── metrics.py
└── utils/
    ├── __init__.py
    ├── config.py
    └── logging.py

data/loader.py:

"""Data loading utilities."""
import pandas as pd
from pathlib import Path

class DataLoader:
    def __init__(self, data_path: str):
        self.data_path = Path(data_path)
    
    def load_csv(self) -> pd.DataFrame:
        return pd.read_csv(self.data_path)

models/init.py:

"""Model implementations."""
from .random_forest import RandomForestModel
from .neural_net import NeuralNetModel

__all__ = ['RandomForestModel', 'NeuralNetModel']

Main script:

# main.py
from ml_classification.data.loader import DataLoader
from ml_classification.data.preprocessor import Preprocessor
from ml_classification.models import RandomForestModel
from ml_classification.training.trainer import Trainer
from ml_classification.evaluation.metrics import calculate_accuracy

# Load data
loader = DataLoader('data/train.csv')
data = loader.load_csv()

# Preprocess
preprocessor = Preprocessor()
X, y = preprocessor.prepare(data)

# Train model
model = RandomForestModel()
trainer = Trainer(model)
trainer.fit(X, y)

# Evaluate
accuracy = calculate_accuracy(model, X, y)
print(f"Accuracy: {accuracy}")

Making Your Package Installable

Create setup.py:

from setuptools import setup, find_packages

setup(
    name='ml_classification',
    version='1.0.0',
    packages=find_packages(),
    install_requires=[
        'numpy>=1.20.0',
        'pandas>=1.3.0',
        'scikit-learn>=1.0.0',
    ],
    author='Your Name',
    description='ML Classification Package',
    python_requires='>=3.8',
)

Install in development mode:

pip install -e .

Now you can import from anywhere:

from ml_classification import RandomForestModel

Advanced Topics

Lazy Imports for Performance

Lazy imports delay module loading until actually needed:

# Instead of importing at top (slows startup)
import heavy_ml_library

def train_model():
    # Import only when function is called
    import heavy_ml_library
    model = heavy_ml_library.Model()
    return model

When to use:

Large libraries that aren't always needed
Speeding up script startup time
Optional dependencies

Module Reloading (Development)

During development, reload modules without restarting Python:

import importlib
import my_module

# Make changes to my_module.py...

# Reload the module
importlib.reload(my_module)

Warning: Reloading can cause issues with existing instances. Use mainly in interactive sessions.

Conditional Imports

Handle optional dependencies gracefully:

try:
    import matplotlib.pyplot as plt
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False

def plot_data(data):
    if not HAS_MATPLOTLIB:
        print("Matplotlib not available. Install with: pip install matplotlib")
        return
    
    plt.plot(data)
    plt.show()

Key Takeaways

Modules are Python files; packages are directories with __init__.py
Use absolute imports for clarity and maintainability
Understand sys.path to troubleshoot import errors
Organize code by functionality in separate packages
Use all to define your public API
Avoid circular dependencies through proper architecture
Make packages installable with setup.py for reusability

Proper module organization transforms chaotic code into professional, maintainable projects. Start small, refactor as you grow, and your future self will thank you!

Support My Work

Related Blogs