NumPy for Beginners: Complete Data Analysis Fundamentals Guide
Learn NumPy from scratch for data analysis. Beginner-friendly guide covering arrays, operations, statistics, and practical data manipulation for aspiring data scientists and Python developers.

My First Encounter with NumPy
"Just use NumPy," they said. "It's easy," they said. I stared at my screen, completely lost. Arrays? Broadcasting? Vectorization? The documentation assumed I already knew what I was doing.
Sound familiar? I've been there. NumPy seemed like this magical tool everyone used, but nobody explained it in plain English. After months of frustration and breakthroughs, I finally "got it."
This guide is what I wish I had when starting. No jargon. No assumptions. Just clear explanations and practical examples that will take you from "What's NumPy?" to confidently analyzing data.
What is NumPy and Why Should You Care?
NumPy (Numerical Python) is the foundation of data science in Python. Think of it as a supercharged calculator that can handle millions of numbers at once.
Why NumPy Over Python Lists?
import numpy as np
import time
# Create a million numbers
numbers_list = list(range(1000000))
numbers_array = np.array(numbers_list)
# Time a simple operation: multiply everything by 2
start = time.time()
doubled_list = [x * 2 for x in numbers_list]
list_time = time.time() - start
start = time.time()
doubled_array = numbers_array * 2
numpy_time = time.time() - start
print(f"Python list: {list_time:.4f} seconds")
print(f"NumPy array: {numpy_time:.4f} seconds")
print(f"NumPy is {list_time/numpy_time:.0f}x faster!")
Output:
Python list: 0.0523 seconds
NumPy array: 0.0013 seconds
NumPy is 40x faster!
That's not a typo. NumPy is typically 10-100x faster than pure Python for numerical operations.
Installing NumPy
# Using pip
pip install numpy
# Using conda
conda install numpy
# Import NumPy (standard convention)
import numpy as np
# Check version
print(np.__version__)
Your First NumPy Array
Think of arrays as containers that hold numbers in organized rows and columns.
Creating Arrays
import numpy as np
# From a Python list
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)
print(f"Python list: {my_list}")
print(f"NumPy array: {my_array}")
print(f"Type: {type(my_array)}")
# Shorthand
arr = np.array([1, 2, 3, 4, 5])
print(f"\nArray: {arr}")
Understanding Array Shapes
# 1D array (like a single row)
arr_1d = np.array([1, 2, 3, 4])
print(f"1D array: {arr_1d}")
print(f"Shape: {arr_1d.shape}") # (4,) means 4 elements
# 2D array (like a table)
arr_2d = np.array([[1, 2, 3],
[4, 5, 6]])
print(f"\n2D array:\n{arr_2d}")
print(f"Shape: {arr_2d.shape}") # (2, 3) means 2 rows, 3 columns
# 3D array (like stacked tables)
arr_3d = np.array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]]])
print(f"\n3D array:\n{arr_3d}")
print(f"Shape: {arr_3d.shape}") # (2, 2, 2)
Quick Array Creation Functions
# Array of zeros
zeros = np.zeros(5)
print(f"Zeros: {zeros}")
# 2D zeros
zeros_2d = np.zeros((3, 4)) # 3 rows, 4 columns
print(f"\nZeros 2D:\n{zeros_2d}")
# Array of ones
ones = np.ones(5)
print(f"\nOnes: {ones}")
# Array with range of numbers
range_arr = np.arange(10) # 0 to 9
print(f"\nRange 0-9: {range_arr}")
range_arr = np.arange(5, 15) # 5 to 14
print(f"Range 5-14: {range_arr}")
range_arr = np.arange(0, 10, 2) # 0 to 9, step by 2
print(f"Even numbers: {range_arr}")
# Evenly spaced numbers
spaced = np.linspace(0, 10, 5) # 5 numbers from 0 to 10
print(f"\nLinspace: {spaced}")
# Random numbers
random = np.random.rand(5) # 5 random numbers between 0 and 1
print(f"\nRandom: {random}")
random_int = np.random.randint(1, 100, size=10) # 10 random integers
print(f"Random integers: {random_int}")
Accessing Array Elements
Just like lists, but more powerful.
Indexing
arr = np.array([10, 20, 30, 40, 50])
# Access single element
print(f"First element: {arr[0]}")
print(f"Last element: {arr[-1]}")
print(f"Third element: {arr[2]}")
# 2D array indexing
arr_2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(f"\nFull array:\n{arr_2d}")
print(f"Element at row 0, col 1: {arr_2d[0, 1]}") # 2
print(f"Element at row 2, col 2: {arr_2d[2, 2]}") # 9
# Get entire rows or columns
print(f"\nFirst row: {arr_2d[0]}")
print(f"Second column: {arr_2d[:, 1]}") # : means "all rows"
Slicing (Getting Multiple Elements)
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Get range [start:end]
print(f"Elements 2 to 5: {arr[2:6]}") # Note: end index not included
print(f"First 5 elements: {arr[:5]}")
print(f"Last 3 elements: {arr[-3:]}")
print(f"Every other element: {arr[::2]}")
# 2D slicing
arr_2d = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
print(f"\nFirst 2 rows, last 2 columns:")
print(arr_2d[:2, -2:])
Basic Array Operations
The fun part—actually doing math!
Arithmetic Operations
arr = np.array([1, 2, 3, 4, 5])
# Add/subtract/multiply/divide with numbers
print(f"Original: {arr}")
print(f"Add 10: {arr + 10}")
print(f"Multiply by 2: {arr * 2}")
print(f"Divide by 2: {arr / 2}")
print(f"Power of 2: {arr ** 2}")
# Operations between arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([10, 20, 30])
print(f"\narr1: {arr1}")
print(f"arr2: {arr2}")
print(f"arr1 + arr2: {arr1 + arr2}")
print(f"arr1 * arr2: {arr1 * arr2}")
Comparison Operations
arr = np.array([1, 2, 3, 4, 5])
# Create boolean arrays
print(f"Array: {arr}")
print(f"Greater than 3: {arr > 3}")
print(f"Equal to 3: {arr == 3}")
print(f"Less than or equal to 2: {arr <= 2}")
# Use comparisons to filter
filtered = arr[arr > 3]
print(f"\nElements > 3: {filtered}")
even = arr[arr % 2 == 0]
print(f"Even numbers: {even}")
Essential Functions for Data Analysis
The bread and butter of data analysis.
Statistical Functions
data = np.array([23, 45, 67, 12, 89, 34, 56, 78, 90, 21])
print("Data:", data)
print(f"\nMean (average): {np.mean(data)}")
print(f"Median (middle value): {np.median(data)}")
print(f"Standard deviation: {np.std(data):.2f}")
print(f"Variance: {np.var(data):.2f}")
print(f"\nMinimum: {np.min(data)}")
print(f"Maximum: {np.max(data)}")
print(f"Range: {np.max(data) - np.min(data)}")
print(f"\nSum: {np.sum(data)}")
print(f"Product: {np.prod(data)}")
Working with 2D Data (Like Spreadsheets)
# Student grades: rows=students, columns=subjects
grades = np.array([[85, 92, 78], # Student 1
[90, 88, 95], # Student 2
[76, 82, 80], # Student 3
[92, 95, 89]]) # Student 4
print("Grades table:")
print(grades)
# Statistics for each student (across columns)
print(f"\nEach student's average:")
print(np.mean(grades, axis=1))
# Statistics for each subject (across rows)
print(f"\nEach subject's average:")
print(np.mean(grades, axis=0))
# Overall statistics
print(f"\nOverall average: {np.mean(grades):.2f}")
print(f"Highest grade: {np.max(grades)}")
print(f"Lowest grade: {np.min(grades)}")
Reshaping Arrays
Change how your data is organized.
# Start with 1D array
arr = np.arange(12)
print(f"Original: {arr}")
# Reshape to 2D
arr_2d = arr.reshape(3, 4) # 3 rows, 4 columns
print(f"\nReshaped to 3x4:\n{arr_2d}")
# Reshape to different dimensions
arr_2d = arr.reshape(4, 3) # 4 rows, 3 columns
print(f"\nReshaped to 4x3:\n{arr_2d}")
# Flatten back to 1D
flat = arr_2d.flatten()
print(f"\nFlattened: {flat}")
# Transpose (flip rows and columns)
transposed = arr_2d.T
print(f"\nTransposed:\n{transposed}")
Practical Example: Analyzing Sales Data
Let's put it all together with a real-world example.
# Sales data: [Product A, Product B, Product C, Product D]
# Each row is a different month
sales = np.array([
[120, 145, 98, 167], # January
[135, 152, 103, 178], # February
[142, 148, 110, 185], # March
[155, 160, 115, 190], # April
[168, 172, 122, 195] # May
])
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
products = ['Product A', 'Product B', 'Product C', 'Product D']
print("Sales Data (units sold):")
print(sales)
# Total sales per month
monthly_totals = np.sum(sales, axis=1)
print("\nTotal sales per month:")
for month, total in zip(months, monthly_totals):
print(f" {month}: {total} units")
# Total sales per product
product_totals = np.sum(sales, axis=0)
print("\nTotal sales per product:")
for product, total in zip(products, product_totals):
print(f" {product}: {total} units")
# Best and worst performing products
best_product_idx = np.argmax(product_totals)
worst_product_idx = np.argmin(product_totals)
print(f"\nBest performer: {products[best_product_idx]} ({product_totals[best_product_idx]} units)")
print(f"Worst performer: {products[worst_product_idx]} ({product_totals[worst_product_idx]} units)")
# Average sales per product
print("\nAverage monthly sales per product:")
for product, avg in zip(products, np.mean(sales, axis=0)):
print(f" {product}: {avg:.1f} units/month")
# Growth: compare last month to first month
growth = ((sales[-1] - sales[0]) / sales[0]) * 100
print("\nGrowth from January to May:")
for product, g in zip(products, growth):
print(f" {product}: {g:.1f}%")
Sorting and Finding Elements
data = np.array([45, 23, 67, 12, 89, 34, 56])
# Sort array
sorted_data = np.sort(data)
print(f"Original: {data}")
print(f"Sorted: {sorted_data}")
# Find indices of sorted elements
sort_indices = np.argsort(data)
print(f"Sort indices: {sort_indices}")
print(f"Using indices: {data[sort_indices]}")
# Find where condition is true
above_50 = np.where(data > 50)
print(f"\nIndices where > 50: {above_50[0]}")
print(f"Values > 50: {data[above_50]}")
# Unique values
data_with_dupes = np.array([1, 2, 2, 3, 3, 3, 4])
unique = np.unique(data_with_dupes)
print(f"\nWith duplicates: {data_with_dupes}")
print(f"Unique values: {unique}")
Combining Arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
# Stack vertically (rows)
v_stacked = np.vstack([arr1, arr2])
print("Vertical stack:")
print(v_stacked)
# Stack horizontally (columns)
h_stacked = np.hstack([arr1, arr2])
print(f"\nHorizontal stack: {h_stacked}")
# Concatenate
concatenated = np.concatenate([arr1, arr2])
print(f"Concatenated: {concatenated}")
Practical Example: Grade Calculator
# Student exam scores
exam1 = np.array([85, 90, 78, 92, 88])
exam2 = np.array([88, 85, 82, 95, 90])
exam3 = np.array([90, 92, 80, 93, 91])
homework = np.array([95, 88, 85, 90, 92])
# Combine all scores
all_scores = np.vstack([exam1, exam2, exam3, homework])
print("All scores:")
print(all_scores)
# Calculate final grades (weighted average)
# Exams: 25% each, Homework: 25%
weights = np.array([0.25, 0.25, 0.25, 0.25])
# Weighted average for each student
final_grades = np.average(all_scores, axis=0, weights=weights)
print("\nFinal grades:")
for i, grade in enumerate(final_grades, 1):
print(f" Student {i}: {grade:.2f}")
# Grade distribution
print(f"\nClass average: {np.mean(final_grades):.2f}")
print(f"Highest grade: {np.max(final_grades):.2f}")
print(f"Lowest grade: {np.min(final_grades):.2f}")
# Letter grades
def get_letter_grade(score):
if score >= 90:
return 'A'
elif score >= 80:
return 'B'
elif score >= 70:
return 'C'
elif score >= 60:
return 'D'
else:
return 'F'
print("\nLetter grades:")
for i, grade in enumerate(final_grades, 1):
letter = get_letter_grade(grade)
print(f" Student {i}: {grade:.2f} ({letter})")
Random Numbers for Data Science
NumPy's random module is essential for simulations and testing.
# Set seed for reproducibility
np.random.seed(42)
# Random floats between 0 and 1
random_floats = np.random.rand(5)
print(f"Random floats: {random_floats}")
# Random integers
random_ints = np.random.randint(1, 100, size=10)
print(f"Random integers (1-99): {random_ints}")
# Normal distribution (bell curve)
normal_data = np.random.randn(1000)
print(f"\nNormal distribution:")
print(f" Mean: {np.mean(normal_data):.4f}")
print(f" Std: {np.std(normal_data):.4f}")
# Random choice from array
fruits = np.array(['apple', 'banana', 'orange', 'grape'])
random_fruit = np.random.choice(fruits)
print(f"\nRandom fruit: {random_fruit}")
# Shuffle array
arr = np.arange(10)
np.random.shuffle(arr)
print(f"Shuffled: {arr}")
Common Mistakes to Avoid
Mistake 1: Forgetting Array Shape
# Shape matters!
arr_1d = np.array([1, 2, 3])
arr_2d = np.array([[1, 2, 3]])
print(f"1D shape: {arr_1d.shape}") # (3,)
print(f"2D shape: {arr_2d.shape}") # (1, 3)
# They look similar but behave differently
Mistake 2: Modifying Views Instead of Copies
original = np.array([1, 2, 3, 4, 5])
view = original[1:4] # This is a VIEW, not a copy
view[0] = 999
print(f"Original: {original}") # Changed!
# To avoid this, make a copy
original = np.array([1, 2, 3, 4, 5])
actual_copy = original[1:4].copy()
actual_copy[0] = 999
print(f"Original (with copy): {original}") # Unchanged
Mistake 3: Comparing Arrays Wrong
arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 2, 3])
# WRONG: Returns array of booleans
# if arr1 == arr2: # This causes error!
# RIGHT: Use np.array_equal()
if np.array_equal(arr1, arr2):
print("Arrays are equal!")
Quick Reference Cheat Sheet
# Creation
np.array([1, 2, 3]) # From list
np.zeros(5) # [0, 0, 0, 0, 0]
np.ones(5) # [1, 1, 1, 1, 1]
np.arange(10) # [0, 1, 2, ..., 9]
np.linspace(0, 10, 5) # 5 numbers from 0 to 10
np.random.rand(3, 3) # 3×3 random array
# Info
arr.shape # Dimensions
arr.dtype # Data type
arr.size # Total elements
arr.ndim # Number of dimensions
# Operations
arr + 10 # Add to all
arr * 2 # Multiply all
arr > 5 # Boolean array
# Statistics
np.mean(arr) # Average
np.median(arr) # Middle value
np.std(arr) # Standard deviation
np.min(arr) # Minimum
np.max(arr) # Maximum
np.sum(arr) # Total sum
# Indexing
arr[0] # First element
arr[-1] # Last element
arr[2:5] # Slice
arr[arr > 5] # Filter
# Reshaping
arr.reshape(2, 3) # New shape
arr.flatten() # To 1D
arr.T # Transpose
Your Next Steps
Congratulations! You now understand NumPy fundamentals. Here's what to learn next:
- Practice - Work with real datasets (CSV files, Excel)
- Pandas - Built on NumPy, makes data analysis even easier
- Matplotlib - Visualize your NumPy arrays as charts
- Machine Learning - Use NumPy with scikit-learn
Remember: Every data scientist uses NumPy daily. You've just learned the foundation of the entire data science ecosystem.
Practice Exercises
Try these on your own:
- Create an array of 100 random numbers and find mean, median, std
- Simulate dice rolls: roll two dice 1000 times, calculate average
- Create a 5×5 multiplication table using NumPy
- Analyze temperature data: create 30 random temperatures, find average, max, min
- Calculate grades: given test scores, compute weighted averages
Resources
Found this guide helpful? Share it with someone learning Python! Connect with me on Twitter or LinkedIn for more beginner-friendly data science content.
Support My Work
If this guide helped you with this topic, I'd really appreciate your support! Creating comprehensive, free content like this takes significant time and effort. Your support helps me continue sharing knowledge and creating more helpful resources for aspiring data scientists and engineers.
☕ Buy me a coffee - Every contribution, big or small, means the world to me and keeps me motivated to create more content!
Cover image by Olav Ahrens Røtne on Unsplash