Ojaswi Athghara | Essential Python Modules: Requests, JSON, Datetime, OS for Data Science

Essential Python Modules: Requests, JSON, Datetime, OS for Data Science

The Modules Every Data Scientist Uses Daily

Every data science project needs these modules. Here's your complete guide!

Requests: HTTP for Humans

The requests library makes HTTP requests simple and Pythonic.

Basic HTTP Methods

import requests

# GET request - retrieve data
response = requests.get('https://api.github.com/users/python')
data = response.json()
print(f"Status: {response.status_code}")
print(f"Python has {data['public_repos']} public repos")

# POST request - send data
payload = {'username': 'alice', 'email': 'alice@example.com'}
response = requests.post('https://api.example.com/users', json=payload)

# PUT request - update data
updated_data = {'email': 'newemail@example.com'}
response = requests.put('https://api.example.com/users/123', json=updated_data)

# DELETE request
response = requests.delete('https://api.example.com/users/123')

Headers and Authentication

# Custom headers
headers = {
    'User-Agent': 'My App/1.0',
    'Accept': 'application/json',
    'Authorization': 'Bearer YOUR_TOKEN_HERE'
}

response = requests.get('https://api.example.com/data', headers=headers)

# Basic authentication
from requests.auth import HTTPBasicAuth
response = requests.get(
    'https://api.example.com/data',
    auth=HTTPBasicAuth('username', 'password')
)

# Bearer token (common for APIs)
headers = {'Authorization': f'Bearer {api_token}'}
response = requests.get('https://api.example.com/data', headers=headers)

Query Parameters

# Method 1: URL string
response = requests.get('https://api.example.com/search?q=python&limit=10')

# Method 2: params dict (cleaner!)
params = {
    'q': 'python',
    'limit': 10,
    'sort': 'stars',
    'order': 'desc'
}
response = requests.get('https://api.example.com/search', params=params)
print(response.url)  # See full URL with parameters

Error Handling

try:
    response = requests.get('https://api.example.com/data', timeout=5)
    
    # Raise exception for bad status codes (4xx, 5xx)
    response.raise_for_status()
    
    data = response.json()
    
except requests.exceptions.Timeout:
    print("Request timed out after 5 seconds")
except requests.exceptions.ConnectionError:
    print("Failed to connect to server")
except requests.exceptions.HTTPError as e:
    print(f"HTTP error: {e}")
    print(f"Status code: {response.status_code}")
except requests.exceptions.RequestException as e:
    print(f"Error: {e}")

Sessions for Multiple Requests

# Session object persists cookies, headers across requests
session = requests.Session()
session.headers.update({'Authorization': f'Bearer {token}'})

# All requests use the same headers and cookies
response1 = session.get('https://api.example.com/users')
response2 = session.get('https://api.example.com/posts')
response3 = session.post('https://api.example.com/comments', json=data)

session.close()  # Clean up

File Uploads

# Upload file
files = {'file': open('dataset.csv', 'rb')}
response = requests.post('https://api.example.com/upload', files=files)

# Upload with additional data
files = {'file': ('report.pdf', open('report.pdf', 'rb'), 'application/pdf')}
data = {'description': 'Q4 Report', 'category': 'financial'}
response = requests.post('https://api.example.com/upload', files=files, data=data)

JSON: Data Interchange Format

JSON (JavaScript Object Notation) is the universal format for data exchange between systems.

Serialization and Deserialization

import json

# Python dict to JSON string (serialization)
data = {
    'name': 'Alice',
    'age': 25,
    'is_active': True,
    'skills': ['Python', 'ML', 'Data Science'],
    'projects': {'count': 10, 'featured': 3}
}

# dumps() = dump string
json_string = json.dumps(data, indent=2)
print(json_string)

# loads() = load string (deserialization)
parsed_data = json.loads(json_string)
print(parsed_data['name'])  # Alice
print(type(parsed_data))    # <class 'dict'>

File Operations

# Write JSON to file - dump()
with open('data.json', 'w') as f:
    json.dump(data, f, indent=2)

# Read JSON from file - load()
with open('data.json', 'r') as f:
    loaded_data = json.load(f)
    print(loaded_data)

Handling Special Data Types

from datetime import datetime
import json

# Problem: datetime isn't JSON serializable
data = {
    'timestamp': datetime.now(),
    'value': 42
}

# This fails!
# json.dumps(data)  # TypeError: Object of type datetime is not JSON serializable

# Solution 1: Convert to string manually
data['timestamp'] = data['timestamp'].isoformat()
json_string = json.dumps(data)

# Solution 2: Custom encoder
class DateTimeEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        return super().default(obj)

data = {'timestamp': datetime.now(), 'value': 42}
json_string = json.dumps(data, cls=DateTimeEncoder)

Pretty Printing and Formatting

data = {
    'users': [
        {'name': 'Alice', 'age': 25},
        {'name': 'Bob', 'age': 30}
    ]
}

# Compact (no spaces)
compact = json.dumps(data)
print(compact)
# {"users":[{"name":"Alice","age":25},{"name":"Bob","age":30}]}

# Pretty print (readable)
pretty = json.dumps(data, indent=2)
print(pretty)
# {
#   "users": [
#     {"name": "Alice", "age": 25},
#     {"name": "Bob", "age": 30}
#   ]
# }

# Sort keys alphabetically
sorted_json = json.dumps(data, indent=2, sort_keys=True)

Working with APIs

import requests
import json

# API returns JSON
response = requests.get('https://api.github.com/users/python')
data = response.json()  # Automatically parses JSON

# Save API response
with open('github_data.json', 'w') as f:
    json.dump(data, f, indent=2)

# Send JSON to API
payload = {
    'title': 'My Post',
    'body': 'Content here',
    'userId': 1
}
response = requests.post(
    'https://jsonplaceholder.typicode.com/posts',
    json=payload  # Automatically converts to JSON
)

Datetime: Time Series and Dates

The datetime module is essential for time series analysis and date handling.

Creating Datetime Objects

from datetime import datetime, date, time, timedelta

# Current date and time
now = datetime.now()
print(f"Current time: {now}")  # 2025-01-15 14:30:00.123456

# Current date only
today = date.today()
print(f"Today: {today}")  # 2025-01-15

# Specific datetime
specific = datetime(2025, 1, 15, 14, 30, 0)
print(specific)  # 2025-01-15 14:30:00

# Date only
specific_date = date(2025, 1, 15)

# Time only
specific_time = time(14, 30, 0)

Formatting Datetime (strftime)

now = datetime.now()

# Common formats
print(now.strftime("%Y-%m-%d"))           # 2025-01-15
print(now.strftime("%Y-%m-%d %H:%M:%S"))  # 2025-01-15 14:30:00
print(now.strftime("%B %d, %Y"))          # January 15, 2025
print(now.strftime("%d/%m/%Y"))           # 15/01/2025
print(now.strftime("%I:%M %p"))           # 02:30 PM

# Timestamp
timestamp = now.strftime("%Y%m%d_%H%M%S")  # 20250115_143000
print(f"log_{timestamp}.txt")

Parsing Strings to Datetime (strptime)

# Parse string to datetime
date_string = "2025-01-15"
parsed = datetime.strptime(date_string, "%Y-%m-%d")

# Different formats
date1 = datetime.strptime("15/01/2025", "%d/%m/%Y")
date2 = datetime.strptime("Jan 15, 2025", "%b %d, %Y")
date3 = datetime.strptime("2025-01-15 14:30:00", "%Y-%m-%d %H:%M:%S")

# ISO format (common in APIs)
iso_string = "2025-01-15T14:30:00"
iso_date = datetime.fromisoformat(iso_string)

Date Arithmetic

from datetime import timedelta

now = datetime.now()

# Add time
tomorrow = now + timedelta(days=1)
next_week = now + timedelta(weeks=1)
in_3_hours = now + timedelta(hours=3)
in_30_mins = now + timedelta(minutes=30)

# Subtract time
yesterday = now - timedelta(days=1)
last_month = now - timedelta(days=30)
an_hour_ago = now - timedelta(hours=1)

# Calculate difference
start = datetime(2025, 1, 1)
end = datetime(2025, 1, 15)
difference = end - start
print(f"Days: {difference.days}")           # 14
print(f"Total seconds: {difference.total_seconds()}")  # 1209600.0

Time Zones

from datetime import datetime, timezone
import pytz  # pip install pytz

# UTC time
utc_now = datetime.now(timezone.utc)
print(f"UTC: {utc_now}")

# Specific timezone
eastern = pytz.timezone('US/Eastern')
et_time = datetime.now(eastern)
print(f"Eastern: {et_time}")

# Convert between timezones
utc_time = datetime.now(pytz.UTC)
tokyo_time = utc_time.astimezone(pytz.timezone('Asia/Tokyo'))
print(f"Tokyo: {tokyo_time}")

Practical Data Science Examples

# Filter recent data
def get_recent_data(data, days=7):
    """Get data from last N days."""
    cutoff = datetime.now() - timedelta(days=days)
    return [d for d in data if datetime.fromisoformat(d['timestamp']) > cutoff]

# Generate date range
def date_range(start_date, end_date):
    """Generate all dates between start and end."""
    current = start_date
    while current <= end_date:
        yield current
        current += timedelta(days=1)

# Example usage
start = date(2025, 1, 1)
end = date(2025, 1, 10)
for d in date_range(start, end):
    print(d)

# Time-based filename
def generate_filename(prefix="data", extension="csv"):
    """Create timestamped filename."""
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    return f"{prefix}_{timestamp}.{extension}"

print(generate_filename())  # data_20250115_143000.csv

OS: File and Directory Operations

import os

# Get current directory
cwd = os.getcwd()
print(f"Working directory: {cwd}")

# List files
files = os.listdir('data/')
print(f"Files: {files}")

# Create directory
os.makedirs('results/models', exist_ok=True)

# Check if path exists
if os.path.exists('data.csv'):
    print("File found!")

# Join paths (cross-platform)
file_path = os.path.join('data', 'processed', 'dataset.csv')

# Get file info
file_size = os.path.getsize('data.csv')
print(f"File size: {file_size} bytes")

Pathlib: Modern Path Handling

from pathlib import Path

# Create Path objects
data_dir = Path('data')
file_path = data_dir / 'dataset.csv'

# Check existence
if file_path.exists():
    # Read file
    content = file_path.read_text()
    
    # Get file info
    print(f"Size: {file_path.stat().st_size} bytes")
    print(f"Modified: {file_path.stat().st_mtime}")

# Create directories
Path('models/saved').mkdir(parents=True, exist_ok=True)

# Iterate files
for csv_file in data_dir.glob('*.csv'):
    print(f"Found: {csv_file.name}")

Pickle: Save Python Objects

import pickle

# Save model
model = {'type': 'RandomForest', 'accuracy': 0.95}
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

# Load model
with open('model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

Random: Generate Random Data

import random

# Random numbers
random_int = random.randint(1, 100)
random_float = random.uniform(0, 1)

# Random choice
colors = ['red', 'green', 'blue']
chosen = random.choice(colors)

# Random sample
sample = random.sample(range(100), 10)  # 10 unique numbers

# Shuffle list
data = [1, 2, 3, 4, 5]
random.shuffle(data)

Best Practices

1. Always Set Timeouts for Requests

# Bad - can hang forever
response = requests.get(url)

# Good - set timeout
response = requests.get(url, timeout=5)

# Better - separate connect and read timeouts
response = requests.get(url, timeout=(3, 10))  # 3s connect, 10s read

2. Use Context Managers

# Bad - might not close
session = requests.Session()
response = session.get(url)
session.close()

# Good - auto cleanup
with requests.Session() as session:
    response = session.get(url)

3. Handle JSON Encoding Errors

import json

try:
    data = json.loads(json_string)
except json.JSONDecodeError as e:
    print(f"Invalid JSON: {e}")
    print(f"Position: line {e.lineno}, column {e.colno}")

4. Use ISO Format for Dates in JSON

# Serializing datetime
from datetime import datetime

data = {
    'timestamp': datetime.now().isoformat(),  # Standard format
    'value': 42
}

# Deserializing
import json
from datetime import datetime

data = json.loads(json_string)
timestamp = datetime.fromisoformat(data['timestamp'])

5. Use pathlib for Modern Path Handling

from pathlib import Path

# Good - cross-platform, object-oriented
data_dir = Path('data')
file_path = data_dir / 'dataset.csv'

# Instead of os.path.join
import os
file_path = os.path.join('data', 'dataset.csv')

Complete Example: Data Pipeline

Here's a complete example combining all modules:

import requests
import json
from datetime import datetime
from pathlib import Path
import time

class DataCollector:
    """Collect data from API and save locally."""
    
    def __init__(self, base_url, output_dir='data/raw'):
        self.base_url = base_url
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)
        
        # Create session for reuse
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'DataCollector/1.0',
            'Accept': 'application/json'
        })
    
    def fetch_data(self, endpoint, params=None):
        """Fetch data from API endpoint."""
        url = f"{self.base_url}/{endpoint}"
        
        try:
            response = self.session.get(
                url,
                params=params,
                timeout=(3, 10)
            )
            response.raise_for_status()
            return response.json()
            
        except requests.exceptions.Timeout:
            print(f"Timeout fetching {url}")
            return None
        except requests.exceptions.RequestException as e:
            print(f"Error fetching {url}: {e}")
            return None
    
    def save_data(self, data, prefix='data'):
        """Save data with timestamp."""
        if data is None:
            return None
        
        # Add metadata
        data['_metadata'] = {
            'collected_at': datetime.now().isoformat(),
            'collector': 'DataCollector/1.0'
        }
        
        # Generate filename
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        filename = f"{prefix}_{timestamp}.json"
        filepath = self.output_dir / filename
        
        # Save JSON
        with open(filepath, 'w') as f:
            json.dump(data, f, indent=2)
        
        print(f"✅ Saved: {filepath}")
        return filepath
    
    def collect(self, endpoint, params=None, prefix='data'):
        """Collect and save data."""
        print(f"📡 Fetching {endpoint}...")
        data = self.fetch_data(endpoint, params)
        
        if data:
            return self.save_data(data, prefix)
        return None
    
    def close(self):
        """Clean up resources."""
        self.session.close()

# Usage
collector = DataCollector('https://api.github.com')

# Collect user data
collector.collect('users/python', prefix='github_user')

# Collect with parameters
params = {'q': 'python', 'sort': 'stars', 'order': 'desc'}
collector.collect('search/repositories', params, prefix='github_repos')

# Clean up
collector.close()

Common Pitfalls to Avoid

1. Not Handling Request Exceptions

# Bad - can crash your program
response = requests.get(url)
data = response.json()

# Good - proper error handling
try:
    response = requests.get(url, timeout=5)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")
    data = None

2. Forgetting Timezone Information

# Bad - naive datetime
now = datetime.now()  # No timezone!

# Good - timezone aware
from datetime import timezone
now = datetime.now(timezone.utc)

3. Not Using Custom JSON Encoders

Always handle special types like datetime when serializing to JSON to avoid TypeErrors!

Key Takeaways

requests - Simple HTTP client for APIs
- Use sessions for multiple requests
- Always set timeouts
- Handle errors properly
json - Universal data exchange format
- dumps/loads for strings
- dump/load for files
- Use custom encoders for special types
datetime - Time series and date handling
- Use strftime to format dates
- Use strptime to parse strings
- timedelta for date arithmetic
os/pathlib - File system operations
- Prefer pathlib for modern code
- Use exist_ok=True when creating directories
- Always use context managers for files
Combined power - These modules work together perfectly for data pipelines, API integration, and file operations

Master these modules and you'll handle 90% of data science workflows efficiently!

If this guide helped you understand these modules, I'd love to hear about it! Connect with me on Twitter or LinkedIn.

If this guide helped you master Python's essential modules like requests, json, and datetime, build web scrapers, work with APIs, or handle time-based data, I'd really appreciate your support! Creating comprehensive, practical Python tutorials with real-world examples takes significant time and effort. Your support helps me continue sharing knowledge and creating more helpful resources for Python developers.

☕ Buy me a coffee - Every contribution, big or small, means the world to me and keeps me motivated to create more content!

Cover image by Towfiqu barbhuiya on Unsplash