Skip to main content
R Programming
CHAPTER 19 Beginner

Hypothesis Testing

Updated: May 18, 2026
5 min read

# CHAPTER 19

Hypothesis Testing in R

1. Chapter Introduction

Hypothesis testing is the formal statistical framework for making evidence-based decisions. Is the new drug effective? Are the salary differences real? This chapter builds the complete hypothesis testing toolkit in R.

2. Hypothesis Testing Framework

text
1234567891011121314151617181920
HYPOTHESIS TESTING STEPS:
  1. State H₀ (null hypothesis — "no effect" / "no difference")
  2. State H₁ (alternative hypothesis — what you want to show)
  3. Choose significance level α (usually 0.05 = 5%)
  4. Collect data
  5. Compute test statistic
  6. Compute p-value
  7. Decision:
     • p < α → Reject H₀ (evidence for H₁)
     • p ≥ α → Fail to reject H₀ (insufficient evidence)

p-value interpretation:
  p-value = P(observing this data or more extreme | H₀ is true)
  Small p-value → data is unlikely under H₀ → reject H₀

Common thresholds:
  p < 0.001 → Very strong evidence against H₀ (***)
  p < 0.01  → Strong evidence (***)
  p < 0.05  → Significant evidence (*)
  p ≥ 0.05  → Not significant (ns)

3. One-Sample t-Test

r
123456789101112131415
# H₀: μ = 70000 (population mean salary = $70,000)
# H₁: μ ≠ 70000 (two-tailed)
set.seed(42)
sample_salaries <- rnorm(30, mean=75000, sd=12000)

result <- t.test(sample_salaries, mu=70000, alternative="two.sided")
print(result)
cat("\n")
cat(sprintf("t-statistic: %.3f\n", result$statistic))
cat(sprintf("p-value:     %.4f\n", result$p.value))
cat(sprintf("95%% CI:      [%.0f, %.0f]\n", result$conf.int[1], result$conf.int[2]))
cat(sprintf("Decision:    %s H₀\n", ifelse(result$p.value < 0.05, "REJECT", "FAIL TO REJECT")))

# One-tailed: H₁: μ > 70000
t.test(sample_salaries, mu=70000, alternative="greater")

4. Two-Sample t-Test

r
1234567891011121314151617181920212223242526
# Compare two groups
set.seed(42)
group_A <- rnorm(25, mean=78000, sd=15000)  # Department A
group_B <- rnorm(25, mean=72000, sd=14000)  # Department B

# H₀: μ_A = μ_B (no salary difference between departments)
result <- t.test(group_A, group_B, var.equal=FALSE)  # Welch's t-test
cat(sprintf("Dept A mean: $%.0f\n", mean(group_A)))
cat(sprintf("Dept B mean: $%.0f\n", mean(group_B)))
cat(sprintf("p-value: %.4f → %s\n", result$p.value,
             ifelse(result$p.value < 0.05, "Significant difference!", "No significant difference")))

# Effect size (Cohen's d)
cohen_d <- (mean(group_A) - mean(group_B)) /
           sqrt((var(group_A) + var(group_B)) / 2)
cat(sprintf("Cohen&#039;s d: %.3f (%s effect)\n", cohen_d,
             ifelse(abs(cohen_d) > 0.8, "Large",
             ifelse(abs(cohen_d) > 0.5, "Medium", "Small"))))

# Paired t-test (before/after measurements)
before <- c(72, 68, 75, 80, 65, 70, 78)
after  <- c(78, 74, 80, 85, 72, 76, 83)
paired_result <- t.test(before, after, paired=TRUE, alternative="less")
cat("\nPaired t-test (training effectiveness):\n")
cat(sprintf("Mean improvement: %.1f points\n", mean(after - before)))
cat(sprintf("p-value: %.4f\n", paired_result$p.value))

5. Chi-Squared and ANOVA

r
123456789101112131415161718
# ─── CHI-SQUARED TEST (categorical variables) ─────────
# H₀: No relationship between gender and department preference
observed <- matrix(c(45, 30, 25, 35, 40, 30), nrow=2,
                   dimnames=list(c("Male","Female"), c("Tech","Sales","HR")))
chi_result <- chisq.test(observed)
cat(sprintf("Chi-squared: %.3f, df=%d, p=%.4f\n",
             chi_result$statistic, chi_result$parameter, chi_result$p.value))

# ─── ONE-WAY ANOVA (compare 3+ group means) ──────────
set.seed(42)
dept_salaries <- data.frame(
  salary = c(rnorm(20, 85000, 10000), rnorm(20, 72000, 12000), rnorm(20, 78000, 11000)),
  dept   = rep(c("IT", "HR", "Finance"), each=20)
)
anova_result <- aov(salary ~ dept, data=dept_salaries)
summary(anova_result)
# If significant: which groups differ?
TukeyHSD(anova_result)  # Post-hoc pairwise comparisons

6. Mini Project: Survey Analysis Tool

r
1234567891011121314151617181920212223242526272829303132
# ─── SURVEY ANALYSIS TOOL ────────────────────────────
set.seed(123)
n <- 100
survey <- data.frame(
  gender        = sample(c("Male","Female"), n, replace=TRUE, prob=c(0.55,0.45)),
  dept          = sample(c("IT","HR","Finance","Marketing"), n, replace=TRUE),
  job_satisfact = round(runif(n, 1, 10), 1),
  before_train  = round(runif(n, 60, 80)),
  after_train   = round(runif(n, 65, 92))
)

cat("=== SURVEY ANALYSIS REPORT ===\n\n")

# 1: Is satisfaction different by department?
cat("1. ANOVA: Satisfaction by Department\n")
anova_sat <- aov(job_satisfact ~ dept, data=survey)
p_anova <- summary(anova_sat)[[1]][["Pr(>F)"]][1]
cat(sprintf("   F-test p-value: %.4f — %s\n\n", p_anova,
             ifelse(p_anova < 0.05, "Significant differences!", "No significant differences")))

# 2: Does training improve performance?
cat("2. Paired t-test: Before vs After Training\n")
t_res <- t.test(survey$before_train, survey$after_train, paired=TRUE, alternative="less")
cat(sprintf("   Mean improvement: %.2f points\n", mean(survey$after_train - survey$before_train)))
cat(sprintf("   p-value: %.4f — %s\n\n", t_res$p.value,
             ifelse(t_res$p.value < 0.05, "Training effective!", "Training not significant")))

# 3: Gender vs Department independence
cat("3. Chi-squared: Gender vs Department\n")
chi_res <- chisq.test(table(survey$gender, survey$dept))
cat(sprintf("   Chi² = %.3f, p = %.4f — %s\n", chi_res$statistic, chi_res$p.value,
             ifelse(chi_res$p.value < 0.05, "Association exists!", "Independent")))

7. Common Mistakes

  • p-value ≠ probability H₀ is true: A p-value of 0.04 means "if H₀ were true, there's a 4% chance of seeing this data." It does NOT mean there's a 96% probability that H₁ is true.
  • Multiple testing problem: Running 20 t-tests at α=0.05 gives ~1 false positive by chance. Use Bonferroni correction: p.adjust(p_values, method="bonferroni").

8. MCQs

Question 1

p-value < 0.05 means?

Question 2

Two-sample t-test var.equal=FALSE runs?

Question 3

Paired t-test is used when?

Question 4

Chi-squared test is for?

Question 5

ANOVA tests?

Question 6

TukeyHSD() after ANOVA performs?

Question 7

Cohen's d = 0.8+ indicates?

Question 8

Bonferroni correction addresses?

Question 9

Type I error = ?

Question 10

Confidence interval that doesn't include H₀ value indicates?

9. Interview Questions

  • Q: What is the difference between a one-tailed and two-tailed t-test?
  • Q: What is the multiple testing problem and how do you address it?

10. Summary

Hypothesis testing: H₀ vs H₁, choose α, compute p-value, decide. One-sample t-test: compare mean to value. Two-sample t-test (Welch): compare two group means. Paired t-test: before/after same subjects. Chi-squared: categorical association. ANOVA + TukeyHSD: 3+ group comparison. Cohen's d for effect size. Bonferroni for multiple comparisons. p < 0.05 → reject H₀, but report effect size too.

11. Next Chapter Recommendation

In Chapter 20: Correlation and Regression Analysis, we model relationships between variables and build predictive models.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·