Skip to main content
Data Visualization
CHAPTER 10 Beginner

Box Plots and Statistical Visualization

Updated: May 18, 2026
5 min read

# CHAPTER 10

Box Plots and Statistical Visualization

1. Chapter Introduction

Box plots (box-and-whisker plots) pack five statistical summaries into one compact visualization: minimum, Q1, median, Q3, maximum — plus outliers. They're perfect for comparing distributions across multiple categories simultaneously.

2. Box Plot Anatomy

text
123456789101112131415161718
Box Plot Structure:

              Outlier ○
                │
  Whisker ─── 95th pct (or Q3 + 1.5×IQR, whichever is smaller)
                │
  ┌─────────────┐  ← Q3 (75th percentile)
  │  IQR box    │
  │─────────────│  ← Median (Q2, 50th percentile) ★
  │             │
  └─────────────┘  ← Q1 (25th percentile)
                │
  Whisker ─── 5th pct (or Q1 - 1.5×IQR, whichever is larger)
                │
              Outlier ○

IQR = Q3 - Q1  (Interquartile Range = middle 50% spread)
Outlier threshold: values beyond Q1-1.5×IQR or Q3+1.5×IQR

3. Box Plot Implementation

python
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
data = {
    'Engineering': np.random.normal(95000, 18000, 200),
    'Marketing':   np.random.normal(72000, 12000, 150),
    'Sales':       np.random.normal(65000, 25000, 180),
    'HR':          np.random.normal(58000, 10000, 100),
    'Finance':     np.random.normal(82000, 15000, 130),
}

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Basic box plot
bp = axes[0].boxplot(list(data.values()),
                      labels=list(data.keys()),
                      patch_artist=True,
                      notch=True,      # Notch shows 95% CI for median
                      showfliers=True)

colors = ['#1565C0', '#2E7D32', '#E65100', '#6A1B9A', '#B71C1C']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)
for median in bp['medians']:
    median.set_color('white')
    median.set_linewidth(2)

axes[0].set_title('Salary Distribution by Department', fontsize=13, fontweight='bold')
axes[0].set_ylabel('Annual Salary ($)')
axes[0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x/1000:.0f}K'))
axes[0].grid(True, axis='y', alpha=0.3)
axes[0].spines['top'].set_visible(False)
axes[0].spines['right'].set_visible(False)

# Horizontal box plot with scatter overlay (strip chart)
depts = list(data.keys())
for i, (dept, values) in enumerate(data.items()):
    # Box
    q1, median, q3 = np.percentile(values, [25, 50, 75])
    iqr = q3 - q1
    whisker_low  = values[values >= q1 - 1.5*iqr].min()
    whisker_high = values[values <= q3 + 1.5*iqr].max()
    axes[1].hlines([q1, q3], i-0.3, i+0.3, colors=colors[i], linewidth=2)
    axes[1].vlines([i-0.3, i+0.3], q1, q3, colors=colors[i], linewidth=2)
    axes[1].hlines(median, i-0.3, i+0.3, colors=&#039;white', linewidth=2.5)
    axes[1].vlines(i, whisker_low, q1, colors=colors[i], linewidth=1.5, linestyles=&#039;--')
    axes[1].vlines(i, q3, whisker_high, colors=colors[i], linewidth=1.5, linestyles=&#039;--')
    # Scatter overlay
    jitter = np.random.uniform(-0.2, 0.2, len(values))
    axes[1].scatter(i + jitter, values, alpha=0.15, s=15, color=colors[i])

axes[1].set_xticks(range(len(depts)))
axes[1].set_xticklabels(depts, rotation=20)
axes[1].set_title(&#039;Box + Strip Chart (shows all points)', fontsize=13, fontweight='bold')
axes[1].set_ylabel(&#039;Annual Salary ($)')
axes[1].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f&#039;${x/1000:.0f}K'))
axes[1].grid(True, axis=&#039;y', alpha=0.3)
axes[1].spines[&#039;top'].set_visible(False)
axes[1].spines[&#039;right'].set_visible(False)

plt.tight_layout()
plt.savefig(&#039;boxplots.png', dpi=150)
plt.show()

4. Common Mistakes

  • Misreading whiskers: Box plot whiskers extend to the DATA extremes (within 1.5×IQR), NOT to min/max. Points beyond whiskers are plotted as individual outliers.
  • Using box plots for very small samples: With n<20, box plots are misleading — use a dot plot or show all data points instead.

5. MCQs

Question 1

Box plot median is located at?

Question 2

IQR (Interquartile Range) is?

Question 3

Box plot outliers are points beyond?

Question 4

Notched box plot shows?

Question 5

Box plot whiskers extend to?

Question 6

Box + strip chart combines?

Question 7

patch_artist=True in boxplot?

Question 8

Box plots are most useful for?

Question 9

If Q1=60K and Q3=90K, IQR=?

Question 10

Violin plot vs box plot?

6. Interview Questions

  • Q: What are the five values shown in a box plot?
  • Q: How does a box plot identify outliers?

7. Summary

Box plots show median, IQR (middle 50%), and outliers in one compact visual — ideal for group comparison. Whiskers extend to extreme non-outlier data (1.5×IQR rule). Notches show median confidence intervals. Strip/jitter overlay shows all data points. Violin plots add distribution shape. Seaborn's boxplot and violinplot are simpler to use.

8. Next Chapter Recommendation

In Chapter 11: Working with Colors, Themes, and Styles, we design professionally styled charts with color theory, accessibility principles, and Matplotlib/Seaborn themes.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·