Skip to main content
Pandas & NumPy
CHAPTER 08 Beginner

NumPy Random Module

Updated: May 18, 2026
5 min read

# CHAPTER 8

NumPy Random Module

1. Chapter Introduction

Random number generation is essential for simulations, statistical sampling, data augmentation, and machine learning. NumPy's random module provides cryptographically strong random generation with all major probability distributions.

2. Random Number Generation

python
1234567891011121314151617181920
import numpy as np

# Seed for reproducibility (same seed → same random numbers)
rng = np.random.default_rng(seed=42)  # Modern API (recommended)

# Or legacy API:
np.random.seed(42)

# Uniform random floats [0.0, 1.0)
print(rng.random(5))           # 5 random floats

# Random integers
print(rng.integers(1, 100, 5)) # 5 integers between 1-99
print(rng.integers(1, 7, (3, 4)))  # 3x4 matrix, dice values 1-6

# Standard normal distribution (mean=0, std=1)
print(rng.standard_normal(5))  # 5 normally distributed values

# Normal with custom mean and std
print(rng.normal(loc=170, scale=10, size=5))  # Heights: mean 170cm, std 10

3. Probability Distributions

python
1234567891011121314151617181920212223
rng = np.random.default_rng(42)

# Uniform distribution
uniform = rng.uniform(low=0, high=100, size=1000)

# Normal (Gaussian) distribution — bell curve
normal = rng.normal(loc=0, scale=1, size=1000)

# Binomial (n trials, probability p)
binomial = rng.binomial(n=10, p=0.5, size=1000)  # Coin flips

# Poisson (events per interval)
poisson = rng.poisson(lam=5, size=1000)  # 5 events per hour average

# Exponential (time between events)
exponential = rng.exponential(scale=2.0, size=1000)

# Beta (0-1, useful for probabilities)
beta = rng.beta(a=2, b=5, size=1000)

# Print distribution stats
for name, dist in [('Normal', normal), ('Binomial', binomial), ('Poisson', poisson)]:
    print(f"{name}: mean={np.mean(dist):.2f}, std={np.std(dist):.2f}")

4. Sampling and Shuffling

python
1234567891011121314151617181920212223242526
rng = np.random.default_rng(42)
data = np.arange(1, 21)   # [1, 2, ..., 20]

# Random choice (with replacement)
sample_with = rng.choice(data, size=5, replace=True)
print("With replacement:", sample_with)

# Random choice (without replacement)
sample_without = rng.choice(data, size=5, replace=False)
print("Without replacement:", sample_without)

# Shuffle in-place
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
rng.shuffle(arr)
print("Shuffled:", arr)

# Permutation (returns new shuffled array)
original = np.arange(10)
shuffled = rng.permutation(original)
print("Permuted:", shuffled)

# Weighted random choice
items = np.array(['apple', 'banana', 'cherry'])
weights = np.array([0.5, 0.3, 0.2])  # Probabilities (must sum to 1)
sample = rng.choice(items, size=10, p=weights)
print("Weighted:", np.unique(sample, return_counts=True))

5. Practical: Simulations

python
12345678910111213141516171819202122232425262728
import numpy as np

rng = np.random.default_rng(42)

# Simulation 1: Monte Carlo estimate of π
n_points = 1_000_000
x = rng.uniform(-1, 1, n_points)
y = rng.uniform(-1, 1, n_points)
inside_circle = (x**2 + y**2) <= 1
pi_estimate = 4 * np.sum(inside_circle) / n_points
print(f"π estimate: {pi_estimate:.4f}")  # ~3.1416

# Simulation 2: Stock price simulation (random walk)
n_days = 252  # Trading days in a year
daily_returns = rng.normal(loc=0.0005, scale=0.02, size=n_days)
price = 100 * np.cumprod(1 + daily_returns)
print(f"Final stock price: ${price[-1]:.2f}")
print(f"Return: {(price[-1]/100 - 1)*100:.1f}%")

# Simulation 3: Bootstrap confidence interval
data = np.array([34, 45, 67, 23, 78, 56, 89, 45, 34, 67])
n_bootstrap = 10000
bootstrap_means = [rng.choice(data, len(data), replace=True).mean()
                   for _ in range(n_bootstrap)]
bootstrap_means = np.array(bootstrap_means)
ci_lower = np.percentile(bootstrap_means, 2.5)
ci_upper = np.percentile(bootstrap_means, 97.5)
print(f"95% CI for mean: [{ci_lower:.1f}, {ci_upper:.1f}]")

6. Common Mistakes

  • Not seeding for reproducibility: Machine learning experiments must be reproducible. Always set np.random.seed() or use defaultrng(seed).
  • Old np.random.rand() vs new rng.random(): The new defaultrng() API is statistically superior. Prefer it for new code.

7. MCQs

Question 1

Purpose of setting random seed?

Question 2

rng.normal(loc=0, scale=1) generates?

Question 3

rng.choice(arr, replace=False) means?

Question 4

Monte Carlo methods use?

Question 5

rng.integers(1, 7) simulates?

Question 6

rng.permutation(arr) vs rng.shuffle(arr)?

Question 7

Binomial distribution models?

Question 8

Poisson distribution models?

Question 9

np.cumprod([1.1, 1.2, 0.9]) returns?

Question 10

defaultrng(seed=42) creates?

8. Interview Questions

  • Q: Why is reproducibility important in data science and how do you ensure it?
  • Q: What is the difference between uniform and normal distributions?

9. Summary

NumPy's random module provides all major distributions for simulation, sampling, and ML. Always seed with default
rng(seed) for reproducibility. Monte Carlo methods demonstrate the power of random simulation for estimation. Bootstrap sampling uses resampling to estimate confidence intervals.

10. Next Chapter Recommendation

In Chapter 9: Introduction to Pandas, we begin Pandas — the library that transforms data analysis from programming into data science.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·