Skip to main content
Pandas & NumPy
CHAPTER 07 Beginner

NumPy Broadcasting and Vectorization

Updated: May 18, 2026
5 min read

# CHAPTER 7

NumPy Broadcasting and Vectorization

1. Chapter Introduction

Broadcasting is NumPy's mechanism for performing arithmetic on arrays of different shapes. Vectorization eliminates Python loops — making code 100-1000x faster. These are the two most important performance concepts in NumPy.

2. Vectorization vs Python Loops

python
1234567891011121314151617181920
import numpy as np
import time

n = 1_000_000
data = np.random.rand(n)

# Python loop (slow)
start = time.time()
result_py = [x * 2 + 1 for x in data]
py_time = time.time() - start

# NumPy vectorized (fast)
start = time.time()
result_np = data * 2 + 1
np_time = time.time() - start

print(f"Python loop: {py_time:.3f}s")
print(f"NumPy:       {np_time:.4f}s")
print(f"Speedup:     {py_time / np_time:.0f}x faster")
# Typical output: NumPy is ~50-200x faster

3. Broadcasting Rules

text
12345678910111213141516171819
Broadcasting Rules (applied dimension by dimension):
1. If arrays have different ndim, prepend 1s to shape of smaller array
2. Arrays are compatible if dimensions are equal OR one of them is 1
3. Output shape is the maximum of each dimension

Example:
arr (3, 4) + scalar (1,)
→ scalar broadcasts to (3, 4)
→ Result: (3, 4)

arr (3, 4) + row (1, 4)
→ row broadcasts to (3, 4)
→ Result: (3, 4)

arr (3, 1) + col (1, 4)
→ arr broadcasts to (3, 4), col to (3, 4)
→ Result: (3, 4)

INCOMPATIBLE: (3, 4) + (3, 3) → Error! 43 and neither is 1

4. Broadcasting Examples

python
12345678910111213141516171819202122232425
# Example 1: Scalar broadcasting
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])
print(matrix + 10)     # Adds 10 to every element
print(matrix * 2)      # Doubles every element

# Example 2: Row vector broadcasting
row = np.array([1, 2, 3])          # shape (3,)
matrix = np.ones((4, 3))           # shape (4, 3)
result = matrix + row              # row broadcasts to (4,3)
print(result)

# Example 3: Column vector broadcasting
col = np.array([[1], [2], [3], [4]])   # shape (4, 1)
matrix = np.ones((4, 3))               # shape (4, 3)
result = matrix + col                   # col broadcasts to (4,3)
print(result)
# [[2,2,2],[3,3,3],[4,4,4],[5,5,5]]

# Example 4: Two vectors → outer product
x = np.array([1, 2, 3])     # shape (3,)
y = np.array([[10], [20], [30]])  # shape (3, 1)
outer = x + y   # (3,) + (3,1) → (3,3)
print(outer)
# [[11,12,13],[21,22,23],[31,32,33]]

5. Practical Vectorization Patterns

python
123456789101112131415161718192021
# Pattern 1: Normalize data (z-score)
data = np.array([85, 92, 78, 96, 67])
mean = np.mean(data)
std = np.std(data)
normalized = (data - mean) / std    # Vectorized
print(normalized.round(3))

# Pattern 2: Min-Max scaling
scaled = (data - data.min()) / (data.max() - data.min())
print(scaled.round(3))

# Pattern 3: Pairwise distances
points = np.array([[1, 2], [4, 6], [7, 1]])
# Distance from origin to each point
distances = np.sqrt(np.sum(points**2, axis=1))
print(distances.round(2))    # [2.24 7.21 7.07]

# Pattern 4: Conditional vectorized assignment
sales = np.array([120, 340, 280, 510, 90, 450])
bonus = np.where(sales > 300, sales * 0.1, sales * 0.05)
print(bonus)   # 10% bonus if >300, else 5%

6. Mini Project: Salary Calculator System

python
123456789101112131415161718192021222324252627282930313233343536373839404142
import numpy as np

# Employee data
employees = {
    'names': np.array(['Alice', 'Bob', 'Carol', 'David', 'Eve', 'Frank']),
    'base_salary': np.array([55000, 72000, 48000, 88000, 61000, 95000]),
    'performance': np.array([0.95, 0.82, 1.05, 0.78, 1.12, 0.91]),  # multiplier
    'years': np.array([3, 7, 2, 12, 5, 15]),
    'department': np.array(['Eng', 'Mkt', 'Sales', 'Eng', 'Mkt', 'Eng'])
}

# Vectorized calculations (no loops!)
base = employees['base_salary']
perf = employees['performance']
years = employees['years']

# Annual bonus: performance * 10% of base
bonus = base * (perf - 0.8) * 0.5
bonus = np.maximum(bonus, 0)   # No negative bonus

# Seniority raise: 2% per year (capped at 20%)
seniority_pct = np.minimum(years * 0.02, 0.20)
seniority_raise = base * seniority_pct

# Total compensation
total = base + bonus + seniority_raise

# Tax brackets (vectorized)
tax_rate = np.where(total > 100000, 0.35,
           np.where(total > 75000, 0.28,
           np.where(total > 50000, 0.22, 0.15)))

net = total * (1 - tax_rate)

print(f"{&#039;Name':<8} {'Base':>8} {'Bonus':>7} {'Seniority':>10} {'Total':>9} {'Tax%':>5} {'Net':>9}")
print("-" * 60)
for i, name in enumerate(employees[&#039;names']):
    print(f"{name:<8} ${base[i]:>7,.0f} ${bonus[i]:>6,.0f} ${seniority_raise[i]:>9,.0f} ${total[i]:>8,.0f} {tax_rate[i]*100:>4.0f}% ${net[i]:>8,.0f}")

print(f"\nTeam total payroll: ${np.sum(total):,.0f}")
print(f"Average net salary: ${np.mean(net):,.0f}")
print(f"Highest earner: {employees[&#039;names'][np.argmax(net)]}")

7. Common Mistakes

  • Broadcasting incompatibility: (3, 4) + (3, 3) fails. Shapes must be compatible dimension by dimension. Use reshape(-1, 1) to add a dimension.
  • np.where vs Python if: np.where(condition, x, y) is vectorized. Python if cannot work on arrays.

8. MCQs

Question 1

NumPy vectorization is faster because?

Question 2

Broadcasting allows?

Question 3

(3,4) + (4,) broadcasting result shape?

Question 4

np.where(cond, x, y) returns?

Question 5

Normalizing data means?

Question 6

(3,1) + (1,4) broadcasts to?

Question 7

np.maximum(a, 0) returns?

Question 8

Typical speedup of NumPy vs Python loop?

Question 9

Z-score normalization formula?

Question 10

reshape(-1, 1) converts 1D array to?

9. Interview Questions

  • Q: Explain NumPy broadcasting with an example.
  • Q: Why is vectorized code preferred over Python loops in data science?

10. Summary

Vectorization eliminates Python loops — achieving 50-200x speedups. Broadcasting extends scalar/vector operations to match array shapes automatically. np.where provides vectorized conditional logic. These patterns are essential for writing production-quality data science code.

11. Next Chapter Recommendation

In Chapter 8: NumPy Random Module, we generate random data for simulations, statistical sampling, and machine learning dataset creation.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·