CHAPTER 19 Beginner

Hypothesis Testing

Updated: May 18, 2026

5 min read

# CHAPTER 19

Hypothesis Testing in R

1. Chapter Introduction

Hypothesis testing is the formal statistical framework for making evidence-based decisions. Is the new drug effective? Are the salary differences real? This chapter builds the complete hypothesis testing toolkit in R.

2. Hypothesis Testing Framework

text

1234567891011121314151617181920

HYPOTHESIS TESTING STEPS:
  1. State H₀ (null hypothesis — "no effect" / "no difference")
  2. State H₁ (alternative hypothesis — what you want to show)
  3. Choose significance level α (usually 0.05 = 5%)
  4. Collect data
  5. Compute test statistic
  6. Compute p-value
  7. Decision:
     • p < α → Reject H₀ (evidence for H₁)
     • p ≥ α → Fail to reject H₀ (insufficient evidence)

p-value interpretation:
  p-value = P(observing this data or more extreme | H₀ is true)
  Small p-value → data is unlikely under H₀ → reject H₀

Common thresholds:
  p < 0.001 → Very strong evidence against H₀ (***)
  p < 0.01  → Strong evidence (***)
  p < 0.05  → Significant evidence (*)
  p ≥ 0.05  → Not significant (ns)

3. One-Sample t-Test

123456789101112131415

# H₀: μ = 70000 (population mean salary = $70,000)
# H₁: μ ≠ 70000 (two-tailed)
set.seed(42)
sample_salaries <- rnorm(30, mean=75000, sd=12000)

result <- t.test(sample_salaries, mu=70000, alternative="two.sided")
print(result)
cat("\n")
cat(sprintf("t-statistic: %.3f\n", result$statistic))
cat(sprintf("p-value:     %.4f\n", result$p.value))
cat(sprintf("95%% CI:      [%.0f, %.0f]\n", result$conf.int[1], result$conf.int[2]))
cat(sprintf("Decision:    %s H₀\n", ifelse(result$p.value < 0.05, "REJECT", "FAIL TO REJECT")))

# One-tailed: H₁: μ > 70000
t.test(sample_salaries, mu=70000, alternative="greater")

4. Two-Sample t-Test

1234567891011121314151617181920212223242526

# Compare two groups
set.seed(42)
group_A <- rnorm(25, mean=78000, sd=15000)  # Department A
group_B <- rnorm(25, mean=72000, sd=14000)  # Department B

# H₀: μ_A = μ_B (no salary difference between departments)
result <- t.test(group_A, group_B, var.equal=FALSE)  # Welch's t-test
cat(sprintf("Dept A mean: $%.0f\n", mean(group_A)))
cat(sprintf("Dept B mean: $%.0f\n", mean(group_B)))
cat(sprintf("p-value: %.4f → %s\n", result$p.value,
             ifelse(result$p.value < 0.05, "Significant difference!", "No significant difference")))

# Effect size (Cohen's d)
cohen_d <- (mean(group_A) - mean(group_B)) /
           sqrt((var(group_A) + var(group_B)) / 2)
cat(sprintf("Cohen&#039;s d: %.3f (%s effect)\n", cohen_d,
             ifelse(abs(cohen_d) > 0.8, "Large",
             ifelse(abs(cohen_d) > 0.5, "Medium", "Small"))))

# Paired t-test (before/after measurements)
before <- c(72, 68, 75, 80, 65, 70, 78)
after  <- c(78, 74, 80, 85, 72, 76, 83)
paired_result <- t.test(before, after, paired=TRUE, alternative="less")
cat("\nPaired t-test (training effectiveness):\n")
cat(sprintf("Mean improvement: %.1f points\n", mean(after - before)))
cat(sprintf("p-value: %.4f\n", paired_result$p.value))

5. Chi-Squared and ANOVA

123456789101112131415161718

# ─── CHI-SQUARED TEST (categorical variables) ─────────
# H₀: No relationship between gender and department preference
observed <- matrix(c(45, 30, 25, 35, 40, 30), nrow=2,
                   dimnames=list(c("Male","Female"), c("Tech","Sales","HR")))
chi_result <- chisq.test(observed)
cat(sprintf("Chi-squared: %.3f, df=%d, p=%.4f\n",
             chi_result$statistic, chi_result$parameter, chi_result$p.value))

# ─── ONE-WAY ANOVA (compare 3+ group means) ──────────
set.seed(42)
dept_salaries <- data.frame(
  salary = c(rnorm(20, 85000, 10000), rnorm(20, 72000, 12000), rnorm(20, 78000, 11000)),
  dept   = rep(c("IT", "HR", "Finance"), each=20)
)
anova_result <- aov(salary ~ dept, data=dept_salaries)
summary(anova_result)
# If significant: which groups differ?
TukeyHSD(anova_result)  # Post-hoc pairwise comparisons

6. Mini Project: Survey Analysis Tool

1234567891011121314151617181920212223242526272829303132

# ─── SURVEY ANALYSIS TOOL ────────────────────────────
set.seed(123)
n <- 100
survey <- data.frame(
  gender        = sample(c("Male","Female"), n, replace=TRUE, prob=c(0.55,0.45)),
  dept          = sample(c("IT","HR","Finance","Marketing"), n, replace=TRUE),
  job_satisfact = round(runif(n, 1, 10), 1),
  before_train  = round(runif(n, 60, 80)),
  after_train   = round(runif(n, 65, 92))
)

cat("=== SURVEY ANALYSIS REPORT ===\n\n")

# 1: Is satisfaction different by department?
cat("1. ANOVA: Satisfaction by Department\n")
anova_sat <- aov(job_satisfact ~ dept, data=survey)
p_anova <- summary(anova_sat)[[1]][["Pr(>F)"]][1]
cat(sprintf("   F-test p-value: %.4f — %s\n\n", p_anova,
             ifelse(p_anova < 0.05, "Significant differences!", "No significant differences")))

# 2: Does training improve performance?
cat("2. Paired t-test: Before vs After Training\n")
t_res <- t.test(survey$before_train, survey$after_train, paired=TRUE, alternative="less")
cat(sprintf("   Mean improvement: %.2f points\n", mean(survey$after_train - survey$before_train)))
cat(sprintf("   p-value: %.4f — %s\n\n", t_res$p.value,
             ifelse(t_res$p.value < 0.05, "Training effective!", "Training not significant")))

# 3: Gender vs Department independence
cat("3. Chi-squared: Gender vs Department\n")
chi_res <- chisq.test(table(survey$gender, survey$dept))
cat(sprintf("   Chi² = %.3f, p = %.4f — %s\n", chi_res$statistic, chi_res$p.value,
             ifelse(chi_res$p.value < 0.05, "Association exists!", "Independent")))

7. Common Mistakes

p-value ≠ probability H₀ is true: A p-value of 0.04 means "if H₀ were true, there's a 4% chance of seeing this data." It does NOT mean there's a 96% probability that H₁ is true.

Multiple testing problem: Running 20 t-tests at α=0.05 gives ~1 false positive by chance. Use Bonferroni correction: p.adjust(p_values, method="bonferroni").

8. MCQs

Question 1

p-value < 0.05 means?

Question 2

Two-sample t-test `var.equal=FALSE` runs?

Question 3

Paired t-test is used when?

Question 4

Chi-squared test is for?

Question 5

ANOVA tests?

Question 6

`TukeyHSD()` after ANOVA performs?

Question 7

Cohen's d = 0.8+ indicates?

Question 8

Bonferroni correction addresses?

Question 9

Type I error = ?

Question 10

Confidence interval that doesn't include H₀ value indicates?

9. Interview Questions

Q: What is the difference between a one-tailed and two-tailed t-test?

Q: What is the multiple testing problem and how do you address it?

10. Summary

Hypothesis testing: H₀ vs H₁, choose α, compute p-value, decide. One-sample t-test: compare mean to value. Two-sample t-test (Welch): compare two group means. Paired t-test: before/after same subjects. Chi-squared: categorical association. ANOVA + TukeyHSD: 3+ group comparison. Cohen's d for effect size. Bonferroni for multiple comparisons. p < 0.05 → reject H₀, but report effect size too.

11. Next Chapter Recommendation

In Chapter 20: Correlation and Regression Analysis, we model relationships between variables and build predictive models.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Hypothesis Testing in R #

1. Chapter Introduction #

2. Hypothesis Testing Framework #

3. One-Sample t-Test #

4. Two-Sample t-Test #

5. Chi-Squared and ANOVA #

6. Mini Project: Survey Analysis Tool #

7. Common Mistakes #

8. MCQs #

p-value < 0.05 means?

Two-sample t-test var.equal=FALSE runs?

Paired t-test is used when?

Chi-squared test is for?

ANOVA tests?

TukeyHSD() after ANOVA performs?

Cohen's d = 0.8+ indicates?

Bonferroni correction addresses?

Type I error = ?

Confidence interval that doesn't include H₀ value indicates?

9. Interview Questions #

10. Summary #

11. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

Send Feedback / Bug

Feedback Submitted!