Skip to main content
R Programming
CHAPTER 18 Beginner

Probability Distributions

Updated: May 18, 2026
5 min read

# CHAPTER 18

Probability Distributions in R

1. Chapter Introduction

Probability distributions describe how data values are spread. R provides four functions for every distribution: density (d), probability (p), quantile (q), and random generation (r). This chapter masters the most important distributions in data science.

2. R Distribution Function Pattern

text
1234567891011121314151617181920212223
R uses a consistent naming pattern for all distributions:
  d{dist}() — Probability density/mass function (PDF/PMF)
  p{dist}() — Cumulative distribution function (CDF)
  q{dist}() — Quantile function (inverse CDF)
  r{dist}() — Random number generation

Example for Normal distribution:
  dnorm(x, mean, sd)  — Density at x
  pnorm(q, mean, sd)  — P(X ≤ q)
  qnorm(p, mean, sd)  — Value where P(X ≤ value) = p
  rnorm(n, mean, sd)  — n random normal values

Available distributions:
  norm    — Normal
  binom   — Binomial
  pois    — Poisson
  t       — Student's t
  f       — F distribution
  chisq   — Chi-squared
  exp     — Exponential
  unif    — Uniform
  gamma   — Gamma
  beta    — Beta

3. Normal Distribution

r
123456789101112131415161718192021222324252627282930313233343536373839404142
library(ggplot2)

# Normal distribution parameters
mu <- 75000   # Mean salary
sigma <- 15000  # Standard deviation

# PDF — Density at a specific value
dnorm(75000, mean=mu, sd=sigma)  # Maximum (at mean)
dnorm(90000, mean=mu, sd=sigma)  # Lower (away from mean)

# CDF — Cumulative probability
pnorm(90000, mean=mu, sd=sigma)  # P(salary ≤ 90000) = 0.841
pnorm(60000, mean=mu, sd=sigma)  # P(salary ≤ 60000) = 0.159

# P(salary between 60K and 90K)
pnorm(90000, mu, sigma) - pnorm(60000, mu, sigma)  # 0.683 (68-95-99.7 rule!)

# Quantiles
qnorm(0.90, mean=mu, sd=sigma)  # 90th percentile = $94,224
qnorm(0.95, mean=mu, sd=sigma)  # 95th percentile
qnorm(0.975, mean=0, sd=1)      # 1.96 (used in 95% CI)

# Random generation
set.seed(42)
sample_salaries <- rnorm(1000, mean=mu, sd=sigma)
mean(sample_salaries)  # ≈ 75000
sd(sample_salaries)    # ≈ 15000

# Visualize normal distribution
x <- seq(30000, 120000, length.out=300)
df_norm <- data.frame(x=x, y=dnorm(x, mu, sigma))

ggplot(df_norm, aes(x, y)) +
  geom_line(color="#1565C0", size=1.5) +
  geom_area(data=subset(df_norm, x >= 60000 & x <= 90000),
             aes(y=y), fill="#90CAF9", alpha=0.4) +
  geom_vline(xintercept=mu, color="red", linetype="dashed") +
  scale_x_continuous(labels=function(x) paste0("$",x/1000,"K")) +
  labs(title="Salary Distribution (Normal)",
       subtitle="Shaded area = P(60K < X < 90K) ≈ 68%",
       x="Salary", y="Density") +
  theme_minimal()

4. Binomial Distribution

r
12345678910111213141516171819202122232425262728293031
# Binomial: n trials, probability p of success
# Example: 15 customers, 30% purchase probability
n <- 15; p <- 0.30

# P(exactly 5 buy) = P(X = 5)
dbinom(5, size=n, prob=p)   # 0.2061

# P(at most 5 buy) = P(X ≤ 5)
pbinom(5, size=n, prob=p)   # 0.7216

# P(more than 5 buy) = P(X > 5)
1 - pbinom(5, size=n, prob=p)  # 0.2784

# Expected value and variance
cat("E[X] =", n * p, "\n")         # 4.5
cat("Var[X] =", n * p * (1-p), "\n")  # 3.15
cat("SD[X] =", sqrt(n * p * (1-p)), "\n")  # 1.77

# Random binomial sample
rbinom(10, size=n, prob=p)

# Visualize
x_binom <- 0:15
df_binom <- data.frame(x=x_binom, prob=dbinom(x_binom, n, p))
ggplot(df_binom, aes(x, prob)) +
  geom_col(aes(fill=x==5), show.legend=FALSE) +
  scale_fill_manual(values=c("FALSE"="#90CAF9","TRUE"="#1565C0")) +
  labs(title="Binomial Distribution (n=15, p=0.3)",
       subtitle="Blue bar: P(X=5) = 0.206",
       x="Number of Purchases", y="Probability") +
  theme_minimal()

5. Poisson Distribution

r
123456789101112131415161718192021
# Poisson: rate λ events per time period (count data)
# Example: λ = 4 customer arrivals per hour
lambda <- 4

# P(exactly 6 arrive) = P(X = 6)
dpois(6, lambda=lambda)   # 0.1042

# P(at most 4 arrive)
ppois(4, lambda=lambda)   # 0.6289

# P(more than 6 arrive)
1 - ppois(6, lambda=lambda)  # 0.1107

# Random Poisson
set.seed(42)
hourly_arrivals <- rpois(100, lambda=lambda)  # 100 hours
mean(hourly_arrivals)  # ≈ 4  (mean = lambda)
var(hourly_arrivals)   # ≈ 4  (Poisson: mean = variance!)

cat("Mean=Variance for Poisson:", round(mean(hourly_arrivals),2),
    "≈", round(var(hourly_arrivals),2), "\n")

6. Common Mistakes

  • pnorm() default is lower.tail=TRUE: pnorm(90000, 75000, 15000) gives P(X ≤ 90000). For P(X > 90000), use pnorm(90000, ..., lower.tail=FALSE) or 1 - pnorm(...).
  • Binomial prob= not p=: The argument is prob, not p. Using dbinom(5, 15, p=0.3) works due to partial matching but prob=0.3 is explicit and clear.

7. MCQs

Question 1

pnorm(x, mean, sd) computes?

Question 2

qnorm(0.975, 0, 1) returns approximately?

Question 3

Binomial distribution requires?

Question 4

Poisson distribution property: mean equals?

Question 5

rnorm(1000, 0, 1) generates?

Question 6

dnorm(mean, mean, sd) is?

Question 7

1 - pbinom(5, 15, 0.3) computes?

Question 8

68-95-99.7 rule says?

Question 9

dpois(0, lambda=3) gives?

Question 10

set.seed(42) before rnorm() ensures?

8. Interview Questions

  • Q: What is the difference between dnorm(), pnorm(), qnorm(), and rnorm()?
  • Q: When would you use a Poisson vs Binomial distribution?

9. Summary

R distribution pattern: d (density), p (CDF), q (quantile), r (random). Normal: dnorm/pnorm/qnorm/rnorm. Binomial: fixed trials + constant probability. Poisson: count data with mean=variance. Key: pnorm(q) = P(X≤q), 1-pnorm(q) = P(X>q). 68-95-99.7 rule: ±1/2/3 SD covers 68%/95%/99.7% of normal data. Always set.seed() for reproducibility.

10. Next Chapter Recommendation

In Chapter 19: Hypothesis Testing, we apply statistical inference — t-tests, chi-squared tests, and p-values — to draw evidence-based conclusions from data.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·