Essential Python Modules: Requests, JSON, Datetime, OS for Data Science
Master Python's essential modules for data science. Learn requests for APIs, json for data, datetime for time series, os for file operations, and more

The Modules Every Data Scientist Uses Daily
Every data science project needs these modules. Here's your complete guide!
Requests: HTTP for Humans
The requests library makes HTTP requests simple and Pythonic.
Basic HTTP Methods
import requests
# GET request - retrieve data
response = requests.get('https://api.github.com/users/python')
data = response.json()
print(f"Status: {response.status_code}")
print(f"Python has {data['public_repos']} public repos")
# POST request - send data
payload = {'username': 'alice', 'email': 'alice@example.com'}
response = requests.post('https://api.example.com/users', json=payload)
# PUT request - update data
updated_data = {'email': 'newemail@example.com'}
response = requests.put('https://api.example.com/users/123', json=updated_data)
# DELETE request
response = requests.delete('https://api.example.com/users/123')
Headers and Authentication
# Custom headers
headers = {
'User-Agent': 'My App/1.0',
'Accept': 'application/json',
'Authorization': 'Bearer YOUR_TOKEN_HERE'
}
response = requests.get('https://api.example.com/data', headers=headers)
# Basic authentication
from requests.auth import HTTPBasicAuth
response = requests.get(
'https://api.example.com/data',
auth=HTTPBasicAuth('username', 'password')
)
# Bearer token (common for APIs)
headers = {'Authorization': f'Bearer {api_token}'}
response = requests.get('https://api.example.com/data', headers=headers)
Query Parameters
# Method 1: URL string
response = requests.get('https://api.example.com/search?q=python&limit=10')
# Method 2: params dict (cleaner!)
params = {
'q': 'python',
'limit': 10,
'sort': 'stars',
'order': 'desc'
}
response = requests.get('https://api.example.com/search', params=params)
print(response.url) # See full URL with parameters
Error Handling
try:
response = requests.get('https://api.example.com/data', timeout=5)
# Raise exception for bad status codes (4xx, 5xx)
response.raise_for_status()
data = response.json()
except requests.exceptions.Timeout:
print("Request timed out after 5 seconds")
except requests.exceptions.ConnectionError:
print("Failed to connect to server")
except requests.exceptions.HTTPError as e:
print(f"HTTP error: {e}")
print(f"Status code: {response.status_code}")
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
Sessions for Multiple Requests
# Session object persists cookies, headers across requests
session = requests.Session()
session.headers.update({'Authorization': f'Bearer {token}'})
# All requests use the same headers and cookies
response1 = session.get('https://api.example.com/users')
response2 = session.get('https://api.example.com/posts')
response3 = session.post('https://api.example.com/comments', json=data)
session.close() # Clean up
File Uploads
# Upload file
files = {'file': open('dataset.csv', 'rb')}
response = requests.post('https://api.example.com/upload', files=files)
# Upload with additional data
files = {'file': ('report.pdf', open('report.pdf', 'rb'), 'application/pdf')}
data = {'description': 'Q4 Report', 'category': 'financial'}
response = requests.post('https://api.example.com/upload', files=files, data=data)
JSON: Data Interchange Format
JSON (JavaScript Object Notation) is the universal format for data exchange between systems.
Serialization and Deserialization
import json
# Python dict to JSON string (serialization)
data = {
'name': 'Alice',
'age': 25,
'is_active': True,
'skills': ['Python', 'ML', 'Data Science'],
'projects': {'count': 10, 'featured': 3}
}
# dumps() = dump string
json_string = json.dumps(data, indent=2)
print(json_string)
# loads() = load string (deserialization)
parsed_data = json.loads(json_string)
print(parsed_data['name']) # Alice
print(type(parsed_data)) # <class 'dict'>
File Operations
# Write JSON to file - dump()
with open('data.json', 'w') as f:
json.dump(data, f, indent=2)
# Read JSON from file - load()
with open('data.json', 'r') as f:
loaded_data = json.load(f)
print(loaded_data)
Handling Special Data Types
from datetime import datetime
import json
# Problem: datetime isn't JSON serializable
data = {
'timestamp': datetime.now(),
'value': 42
}
# This fails!
# json.dumps(data) # TypeError: Object of type datetime is not JSON serializable
# Solution 1: Convert to string manually
data['timestamp'] = data['timestamp'].isoformat()
json_string = json.dumps(data)
# Solution 2: Custom encoder
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)
data = {'timestamp': datetime.now(), 'value': 42}
json_string = json.dumps(data, cls=DateTimeEncoder)
Pretty Printing and Formatting
data = {
'users': [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30}
]
}
# Compact (no spaces)
compact = json.dumps(data)
print(compact)
# {"users":[{"name":"Alice","age":25},{"name":"Bob","age":30}]}
# Pretty print (readable)
pretty = json.dumps(data, indent=2)
print(pretty)
# {
# "users": [
# {"name": "Alice", "age": 25},
# {"name": "Bob", "age": 30}
# ]
# }
# Sort keys alphabetically
sorted_json = json.dumps(data, indent=2, sort_keys=True)
Working with APIs
import requests
import json
# API returns JSON
response = requests.get('https://api.github.com/users/python')
data = response.json() # Automatically parses JSON
# Save API response
with open('github_data.json', 'w') as f:
json.dump(data, f, indent=2)
# Send JSON to API
payload = {
'title': 'My Post',
'body': 'Content here',
'userId': 1
}
response = requests.post(
'https://jsonplaceholder.typicode.com/posts',
json=payload # Automatically converts to JSON
)
Datetime: Time Series and Dates
The datetime module is essential for time series analysis and date handling.
Creating Datetime Objects
from datetime import datetime, date, time, timedelta
# Current date and time
now = datetime.now()
print(f"Current time: {now}") # 2025-01-15 14:30:00.123456
# Current date only
today = date.today()
print(f"Today: {today}") # 2025-01-15
# Specific datetime
specific = datetime(2025, 1, 15, 14, 30, 0)
print(specific) # 2025-01-15 14:30:00
# Date only
specific_date = date(2025, 1, 15)
# Time only
specific_time = time(14, 30, 0)
Formatting Datetime (strftime)
now = datetime.now()
# Common formats
print(now.strftime("%Y-%m-%d")) # 2025-01-15
print(now.strftime("%Y-%m-%d %H:%M:%S")) # 2025-01-15 14:30:00
print(now.strftime("%B %d, %Y")) # January 15, 2025
print(now.strftime("%d/%m/%Y")) # 15/01/2025
print(now.strftime("%I:%M %p")) # 02:30 PM
# Timestamp
timestamp = now.strftime("%Y%m%d_%H%M%S") # 20250115_143000
print(f"log_{timestamp}.txt")
Parsing Strings to Datetime (strptime)
# Parse string to datetime
date_string = "2025-01-15"
parsed = datetime.strptime(date_string, "%Y-%m-%d")
# Different formats
date1 = datetime.strptime("15/01/2025", "%d/%m/%Y")
date2 = datetime.strptime("Jan 15, 2025", "%b %d, %Y")
date3 = datetime.strptime("2025-01-15 14:30:00", "%Y-%m-%d %H:%M:%S")
# ISO format (common in APIs)
iso_string = "2025-01-15T14:30:00"
iso_date = datetime.fromisoformat(iso_string)
Date Arithmetic
from datetime import timedelta
now = datetime.now()
# Add time
tomorrow = now + timedelta(days=1)
next_week = now + timedelta(weeks=1)
in_3_hours = now + timedelta(hours=3)
in_30_mins = now + timedelta(minutes=30)
# Subtract time
yesterday = now - timedelta(days=1)
last_month = now - timedelta(days=30)
an_hour_ago = now - timedelta(hours=1)
# Calculate difference
start = datetime(2025, 1, 1)
end = datetime(2025, 1, 15)
difference = end - start
print(f"Days: {difference.days}") # 14
print(f"Total seconds: {difference.total_seconds()}") # 1209600.0
Time Zones
from datetime import datetime, timezone
import pytz # pip install pytz
# UTC time
utc_now = datetime.now(timezone.utc)
print(f"UTC: {utc_now}")
# Specific timezone
eastern = pytz.timezone('US/Eastern')
et_time = datetime.now(eastern)
print(f"Eastern: {et_time}")
# Convert between timezones
utc_time = datetime.now(pytz.UTC)
tokyo_time = utc_time.astimezone(pytz.timezone('Asia/Tokyo'))
print(f"Tokyo: {tokyo_time}")
Practical Data Science Examples
# Filter recent data
def get_recent_data(data, days=7):
"""Get data from last N days."""
cutoff = datetime.now() - timedelta(days=days)
return [d for d in data if datetime.fromisoformat(d['timestamp']) > cutoff]
# Generate date range
def date_range(start_date, end_date):
"""Generate all dates between start and end."""
current = start_date
while current <= end_date:
yield current
current += timedelta(days=1)
# Example usage
start = date(2025, 1, 1)
end = date(2025, 1, 10)
for d in date_range(start, end):
print(d)
# Time-based filename
def generate_filename(prefix="data", extension="csv"):
"""Create timestamped filename."""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
return f"{prefix}_{timestamp}.{extension}"
print(generate_filename()) # data_20250115_143000.csv
OS: File and Directory Operations
import os
# Get current directory
cwd = os.getcwd()
print(f"Working directory: {cwd}")
# List files
files = os.listdir('data/')
print(f"Files: {files}")
# Create directory
os.makedirs('results/models', exist_ok=True)
# Check if path exists
if os.path.exists('data.csv'):
print("File found!")
# Join paths (cross-platform)
file_path = os.path.join('data', 'processed', 'dataset.csv')
# Get file info
file_size = os.path.getsize('data.csv')
print(f"File size: {file_size} bytes")
Pathlib: Modern Path Handling
from pathlib import Path
# Create Path objects
data_dir = Path('data')
file_path = data_dir / 'dataset.csv'
# Check existence
if file_path.exists():
# Read file
content = file_path.read_text()
# Get file info
print(f"Size: {file_path.stat().st_size} bytes")
print(f"Modified: {file_path.stat().st_mtime}")
# Create directories
Path('models/saved').mkdir(parents=True, exist_ok=True)
# Iterate files
for csv_file in data_dir.glob('*.csv'):
print(f"Found: {csv_file.name}")
Pickle: Save Python Objects
import pickle
# Save model
model = {'type': 'RandomForest', 'accuracy': 0.95}
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)
# Load model
with open('model.pkl', 'rb') as f:
loaded_model = pickle.load(f)
Random: Generate Random Data
import random
# Random numbers
random_int = random.randint(1, 100)
random_float = random.uniform(0, 1)
# Random choice
colors = ['red', 'green', 'blue']
chosen = random.choice(colors)
# Random sample
sample = random.sample(range(100), 10) # 10 unique numbers
# Shuffle list
data = [1, 2, 3, 4, 5]
random.shuffle(data)
Best Practices
1. Always Set Timeouts for Requests
# Bad - can hang forever
response = requests.get(url)
# Good - set timeout
response = requests.get(url, timeout=5)
# Better - separate connect and read timeouts
response = requests.get(url, timeout=(3, 10)) # 3s connect, 10s read
2. Use Context Managers
# Bad - might not close
session = requests.Session()
response = session.get(url)
session.close()
# Good - auto cleanup
with requests.Session() as session:
response = session.get(url)
3. Handle JSON Encoding Errors
import json
try:
data = json.loads(json_string)
except json.JSONDecodeError as e:
print(f"Invalid JSON: {e}")
print(f"Position: line {e.lineno}, column {e.colno}")
4. Use ISO Format for Dates in JSON
# Serializing datetime
from datetime import datetime
data = {
'timestamp': datetime.now().isoformat(), # Standard format
'value': 42
}
# Deserializing
import json
from datetime import datetime
data = json.loads(json_string)
timestamp = datetime.fromisoformat(data['timestamp'])
5. Use pathlib for Modern Path Handling
from pathlib import Path
# Good - cross-platform, object-oriented
data_dir = Path('data')
file_path = data_dir / 'dataset.csv'
# Instead of os.path.join
import os
file_path = os.path.join('data', 'dataset.csv')
Complete Example: Data Pipeline
Here's a complete example combining all modules:
import requests
import json
from datetime import datetime
from pathlib import Path
import time
class DataCollector:
"""Collect data from API and save locally."""
def __init__(self, base_url, output_dir='data/raw'):
self.base_url = base_url
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
# Create session for reuse
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'DataCollector/1.0',
'Accept': 'application/json'
})
def fetch_data(self, endpoint, params=None):
"""Fetch data from API endpoint."""
url = f"{self.base_url}/{endpoint}"
try:
response = self.session.get(
url,
params=params,
timeout=(3, 10)
)
response.raise_for_status()
return response.json()
except requests.exceptions.Timeout:
print(f"Timeout fetching {url}")
return None
except requests.exceptions.RequestException as e:
print(f"Error fetching {url}: {e}")
return None
def save_data(self, data, prefix='data'):
"""Save data with timestamp."""
if data is None:
return None
# Add metadata
data['_metadata'] = {
'collected_at': datetime.now().isoformat(),
'collector': 'DataCollector/1.0'
}
# Generate filename
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
filename = f"{prefix}_{timestamp}.json"
filepath = self.output_dir / filename
# Save JSON
with open(filepath, 'w') as f:
json.dump(data, f, indent=2)
print(f"✅ Saved: {filepath}")
return filepath
def collect(self, endpoint, params=None, prefix='data'):
"""Collect and save data."""
print(f"📡 Fetching {endpoint}...")
data = self.fetch_data(endpoint, params)
if data:
return self.save_data(data, prefix)
return None
def close(self):
"""Clean up resources."""
self.session.close()
# Usage
collector = DataCollector('https://api.github.com')
# Collect user data
collector.collect('users/python', prefix='github_user')
# Collect with parameters
params = {'q': 'python', 'sort': 'stars', 'order': 'desc'}
collector.collect('search/repositories', params, prefix='github_repos')
# Clean up
collector.close()
Common Pitfalls to Avoid
1. Not Handling Request Exceptions
# Bad - can crash your program
response = requests.get(url)
data = response.json()
# Good - proper error handling
try:
response = requests.get(url, timeout=5)
response.raise_for_status()
data = response.json()
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
data = None
2. Forgetting Timezone Information
# Bad - naive datetime
now = datetime.now() # No timezone!
# Good - timezone aware
from datetime import timezone
now = datetime.now(timezone.utc)
3. Not Using Custom JSON Encoders
Always handle special types like datetime when serializing to JSON to avoid TypeErrors!
Key Takeaways
- requests - Simple HTTP client for APIs
- Use sessions for multiple requests
- Always set timeouts
- Handle errors properly
- json - Universal data exchange format
dumps/loadsfor stringsdump/loadfor files- Use custom encoders for special types
- datetime - Time series and date handling
- Use
strftimeto format dates - Use
strptimeto parse strings timedeltafor date arithmetic
- Use
- os/pathlib - File system operations
- Prefer
pathlibfor modern code - Use
exist_ok=Truewhen creating directories - Always use context managers for files
- Prefer
- Combined power - These modules work together perfectly for data pipelines, API integration, and file operations
Master these modules and you'll handle 90% of data science workflows efficiently!
If this guide helped you understand these modules, I'd love to hear about it! Connect with me on Twitter or LinkedIn.
Support My Work
If this guide helped you master Python's essential modules like requests, json, and datetime, build web scrapers, work with APIs, or handle time-based data, I'd really appreciate your support! Creating comprehensive, practical Python tutorials with real-world examples takes significant time and effort. Your support helps me continue sharing knowledge and creating more helpful resources for Python developers.
☕ Buy me a coffee - Every contribution, big or small, means the world to me and keeps me motivated to create more content!
Cover image by Towfiqu barbhuiya on Unsplash