Python Modules and Packages: Organize Code Like a Pro
Master Python modules and packages for scalable projects. Learn imports, __init__.py, package structure, relative imports, and best practices for ML projects

When My Project Became a Mess
My ML project was one giant fileβ3000 lines of chaos:
my_project.py # Everything in one file! π±
Then I learned about modules and packages:
my_project/
βββ __init__.py
βββ models/
β βββ __init__.py
β βββ random_forest.py
β βββ neural_net.py
βββ preprocessing/
β βββ __init__.py
β βββ scalers.py
βββ utils/
βββ __init__.py
βββ metrics.py
Organization = Professional code!
Modules: Single Files
A module is simply a Python file containing definitions and statements. Every .py file you create is automatically a module.
Creating Your First Module
# math_utils.py (module)
def add(a, b):
"""Add two numbers."""
return a + b
def multiply(a, b):
"""Multiply two numbers."""
return a * b
def power(base, exponent):
"""Calculate base raised to exponent."""
return base ** exponent
PI = 3.14159
E = 2.71828
class Calculator:
"""Simple calculator class."""
def __init__(self):
self.history = []
def calculate(self, operation, a, b):
result = operation(a, b)
self.history.append(f"{operation.__name__}({a}, {b}) = {result}")
return result
Using Modules
# main.py
import math_utils
# Access functions
result = math_utils.add(5, 3)
print(result) # 8
# Access constants
print(math_utils.PI) # 3.14159
# Access classes
calc = math_utils.Calculator()
calc.calculate(math_utils.add, 10, 5)
Module Attributes
Every module has special attributes:
# math_utils.py
print(__name__) # Module name
print(__file__) # File path
print(__doc__) # Module docstring
def main():
print("Running as script")
# Run only when executed directly
if __name__ == "__main__":
main()
When you import: __name__ is "math_utils".
When you run directly: __name__ is "__main__".
Packages: Directories with init.py
A package is a directory containing __init__.py. This special file tells Python "this directory is a package."
Basic Package Structure
ml_toolkit/
βββ __init__.py # Makes it a package
βββ models.py
βββ preprocessing.py
βββ utils.py
The init.py File
__init__.py can be empty, but it's powerful when used properly:
# ml_toolkit/__init__.py
"""
ML Toolkit - Machine Learning Utilities
"""
# Import key components for convenient access
from .models import RandomForest, NeuralNetwork
from .preprocessing import scale_data, normalize
from .utils import save_model, load_model
# Define public API
__all__ = ['RandomForest', 'NeuralNetwork', 'scale_data', 'normalize']
# Package metadata
__version__ = '1.0.0'
__author__ = 'Your Name'
# Initialize package-level variables
DEFAULT_CONFIG = {
'random_state': 42,
'verbose': True
}
print(f"Loaded ML Toolkit v{__version__}")
Now users can import directly:
from ml_toolkit import RandomForest, scale_data
# Instead of: from ml_toolkit.models import RandomForest
Nested Packages
Packages can contain subpackages:
ml_toolkit/
βββ __init__.py
βββ models/
β βββ __init__.py
β βββ classification/
β β βββ __init__.py
β β βββ random_forest.py
β β βββ svm.py
β βββ regression/
β βββ __init__.py
β βββ linear.py
βββ preprocessing/
β βββ __init__.py
β βββ scalers.py
β βββ encoders.py
βββ utils/
βββ __init__.py
βββ metrics.py
# Access nested modules
from ml_toolkit.models.classification import random_forest
from ml_toolkit.preprocessing.scalers import StandardScaler
Import Styles
Different Ways to Import
# 1. Import entire module
import numpy
result = numpy.array([1, 2, 3])
# 2. Import specific function/class
from numpy import array
result = array([1, 2, 3])
# 3. Import with alias (most common for popular libraries)
import pandas as pd
df = pd.DataFrame()
# 4. Import multiple items
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
# 5. Import submodule
from sklearn.ensemble import RandomForestClassifier
# 6. Import everything (AVOID - pollutes namespace!)
from numpy import * # Don't do this!
When to Use Each Style
Use full import when you need many functions from a module:
import math
x = math.sin(math.pi / 2)
y = math.cos(0)
z = math.sqrt(16)
Use specific imports when you only need a few items:
from math import sin, cos, pi
x = sin(pi / 2)
y = cos(0)
Use aliases for commonly used libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
Why Avoid from module import *
# Bad - unclear where functions come from
from math import *
from numpy import *
result = sqrt(16) # math.sqrt or numpy.sqrt?
# Good - explicit is better
import math
import numpy as np
result = math.sqrt(16)
array_result = np.sqrt(np.array([16, 25, 36]))
ML Project Structure
ml_project/
βββ __init__.py
βββ data/
β βββ __init__.py
β βββ loader.py
β βββ preprocessor.py
βββ models/
β βββ __init__.py
β βββ base.py
β βββ random_forest.py
β βββ neural_net.py
βββ training/
β βββ __init__.py
β βββ trainer.py
βββ evaluation/
β βββ __init__.py
β βββ metrics.py
βββ utils/
βββ __init__.py
βββ helpers.py
Relative vs Absolute Imports
Absolute Imports (Recommended)
Absolute imports specify the full path from the project root:
# In ml_project/models/neural_net.py
from ml_project.models.random_forest import RandomForest
from ml_project.data.loader import load_data
from ml_project.utils.metrics import accuracy_score
Advantages:
- Clear and explicit
- Works from anywhere
- Easier to understand
- Better for large projects
Relative Imports
Relative imports use dots to navigate the package hierarchy:
# In ml_project/models/neural_net.py
# Import from same directory
from .random_forest import RandomForest
# Import from parent directory
from ..data.loader import load_data
# Import from sibling directory
from ..utils.metrics import accuracy_score
# Go up two levels
from ...config import settings
Dot notation:
.= current package..= parent package...= grandparent package
When to Use Each
Use absolute imports:
- In entry point scripts
- When clarity is important
- In large, complex projects
- When module might move later
Use relative imports:
- Within a tightly coupled package
- When you want package portability
- For internal package structure
Common Relative Import Mistake
# This FAILS in script run directly
# Can only use relative imports inside packages
# wrong.py (run as script)
from .utils import helper # Error: attempted relative import with no known parent package
Module Search Path
Python searches for modules in specific locations. Understanding this prevents import errors.
How Python Finds Modules
import sys
print(sys.path)
# Output:
# [
# '/current/directory',
# '/usr/lib/python3.9',
# '/usr/lib/python3.9/site-packages',
# ...
# ]
Search order:
- Current directory
- PYTHONPATH environment variable directories
- Standard library directories
- Site-packages (installed packages)
Adding Custom Paths
# Method 1: Modify sys.path
import sys
sys.path.append('/path/to/my/modules')
# Method 2: Use PYTHONPATH environment variable
# In terminal:
# export PYTHONPATH="/path/to/my/modules:$PYTHONPATH"
# Method 3: Use .pth file in site-packages
# Create mymodules.pth with path to your modules
Practical Example
# Project structure
my_project/
βββ main.py
βββ config.py
βββ src/
βββ __init__.py
βββ models.py
βββ utils.py
# main.py
import sys
from pathlib import Path
# Add src to path
src_path = Path(__file__).parent / 'src'
sys.path.insert(0, str(src_path))
# Now can import from src
from models import MyModel
from utils import helper_function
Common Import Errors and Solutions
Error 1: ModuleNotFoundError
import my_module # ModuleNotFoundError: No module named 'my_module'
Solutions:
- Check if file exists in current directory
- Verify
__init__.pyexists in package directories - Check
sys.pathincludes the module's directory - Install package if it's external:
pip install my_module
Error 2: Circular Imports
# module_a.py
from module_b import function_b
def function_a():
return function_b()
# module_b.py
from module_a import function_a # Circular import!
def function_b():
return function_a()
Solution: Restructure code or use import inside function:
# module_b.py
def function_b():
from module_a import function_a # Import inside function
return function_a()
Error 3: Relative Import Beyond Top-Level
# Attempting to go beyond package root
from ...something import anything # ValueError: attempted relative import beyond top-level package
Solution: Use absolute imports or restructure package hierarchy.
Error 4: Name Conflicts
# Shadowing standard library
import json # Our json.py file!
data = json.loads('{"key": "value"}') # Error: module has no attribute 'loads'
Solution: Rename your file to avoid conflicts with standard library or installed packages.
Best Practices
1. Use Meaningful Names
# Bad
import utils
from helpers import func
# Good
import data_preprocessing_utils
from validation_helpers import validate_email
2. Keep init.py Clean
# Good __init__.py
"""Package for data processing utilities."""
from .preprocessing import clean_data, normalize
from .validation import validate_input
__all__ = ['clean_data', 'normalize', 'validate_input']
__version__ = '1.0.0'
Avoid complex logic in __init__.py - it runs on every import!
3. Organize by Functionality
# Good structure
ml_project/
βββ data/ # Data-related modules
βββ models/ # Model implementations
βββ training/ # Training logic
βββ evaluation/ # Evaluation metrics
βββ utils/ # General utilities
4. Use all to Define Public API
# models.py
__all__ = ['RandomForest', 'NeuralNetwork'] # Public API
class RandomForest:
pass
class NeuralNetwork:
pass
class _PrivateHelper: # Not exported
pass
5. Document Your Modules
"""
Module: data_preprocessing
This module provides utilities for preprocessing raw data including:
- Data cleaning
- Normalization
- Feature extraction
Example:
from data_preprocessing import clean_data
cleaned = clean_data(raw_data)
"""
6. Avoid Circular Dependencies
Organize imports to create a dependency tree, not a web:
# Good architecture
config β utils β models β training β main
# Bad architecture (circular)
models β training β utils
Real-World Example: Complete ML Project
Let's build a proper structure for a machine learning project:
ml_classification/
βββ __init__.py
βββ setup.py # For installation
βββ data/
β βββ __init__.py
β βββ loader.py # Data loading
β βββ preprocessor.py # Data preprocessing
βββ models/
β βββ __init__.py
β βββ base.py # Base model class
β βββ random_forest.py
β βββ neural_net.py
βββ training/
β βββ __init__.py
β βββ trainer.py
β βββ callbacks.py
βββ evaluation/
β βββ __init__.py
β βββ metrics.py
βββ utils/
βββ __init__.py
βββ config.py
βββ logging.py
data/loader.py:
"""Data loading utilities."""
import pandas as pd
from pathlib import Path
class DataLoader:
def __init__(self, data_path: str):
self.data_path = Path(data_path)
def load_csv(self) -> pd.DataFrame:
return pd.read_csv(self.data_path)
models/init.py:
"""Model implementations."""
from .random_forest import RandomForestModel
from .neural_net import NeuralNetModel
__all__ = ['RandomForestModel', 'NeuralNetModel']
Main script:
# main.py
from ml_classification.data.loader import DataLoader
from ml_classification.data.preprocessor import Preprocessor
from ml_classification.models import RandomForestModel
from ml_classification.training.trainer import Trainer
from ml_classification.evaluation.metrics import calculate_accuracy
# Load data
loader = DataLoader('data/train.csv')
data = loader.load_csv()
# Preprocess
preprocessor = Preprocessor()
X, y = preprocessor.prepare(data)
# Train model
model = RandomForestModel()
trainer = Trainer(model)
trainer.fit(X, y)
# Evaluate
accuracy = calculate_accuracy(model, X, y)
print(f"Accuracy: {accuracy}")
Making Your Package Installable
Create setup.py:
from setuptools import setup, find_packages
setup(
name='ml_classification',
version='1.0.0',
packages=find_packages(),
install_requires=[
'numpy>=1.20.0',
'pandas>=1.3.0',
'scikit-learn>=1.0.0',
],
author='Your Name',
description='ML Classification Package',
python_requires='>=3.8',
)
Install in development mode:
pip install -e .
Now you can import from anywhere:
from ml_classification import RandomForestModel
Advanced Topics
Lazy Imports for Performance
Lazy imports delay module loading until actually needed:
# Instead of importing at top (slows startup)
import heavy_ml_library
def train_model():
# Import only when function is called
import heavy_ml_library
model = heavy_ml_library.Model()
return model
When to use:
- Large libraries that aren't always needed
- Speeding up script startup time
- Optional dependencies
Module Reloading (Development)
During development, reload modules without restarting Python:
import importlib
import my_module
# Make changes to my_module.py...
# Reload the module
importlib.reload(my_module)
Warning: Reloading can cause issues with existing instances. Use mainly in interactive sessions.
Conditional Imports
Handle optional dependencies gracefully:
try:
import matplotlib.pyplot as plt
HAS_MATPLOTLIB = True
except ImportError:
HAS_MATPLOTLIB = False
def plot_data(data):
if not HAS_MATPLOTLIB:
print("Matplotlib not available. Install with: pip install matplotlib")
return
plt.plot(data)
plt.show()
Key Takeaways
- Modules are Python files; packages are directories with
__init__.py - Use absolute imports for clarity and maintainability
- Understand sys.path to troubleshoot import errors
- Organize code by functionality in separate packages
- Use all to define your public API
- Avoid circular dependencies through proper architecture
- Make packages installable with
setup.pyfor reusability
Proper module organization transforms chaotic code into professional, maintainable projects. Start small, refactor as you grow, and your future self will thank you!