CHAPTER 09 Beginner

Histograms and Distribution Analysis

Updated: May 18, 2026

5 min read

# CHAPTER 9

Histograms and Distribution Analysis

1. Chapter Introduction

Before modeling data, you must understand its distribution — is it normal, skewed, bimodal? Histograms answer this question visually in seconds. This chapter covers histograms, KDE, and distribution comparison across groups.

2. Histogram Fundamentals

python

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

np.random.seed(42)
salaries = np.random.normal(loc=75000, scale=15000, size=1000)

fig, axes = plt.subplots(2, 2, figsize=(13, 10))

# 1: Basic histogram
axes[0,0].hist(salaries, bins=30, color=&#039;#2196F3', edgecolor='white', alpha=0.8)
axes[0,0].set_title(&#039;Basic Histogram (30 bins)')
axes[0,0].set_xlabel(&#039;Salary ($)')

# 2: Different bin sizes
axes[0,1].hist(salaries, bins=10, color=&#039;#4CAF50', edgecolor='white', alpha=0.8, label='10 bins')
axes[0,1].hist(salaries, bins=50, color=&#039;#FF9800', edgecolor='white', alpha=0.5, label='50 bins')
axes[0,1].set_title(&#039;Bin Size Comparison')
axes[0,1].legend()

# 3: Histogram + KDE
count, bins, _ = axes[1,0].hist(salaries, bins=30, color=&#039;#9C27B0',
                                  edgecolor=&#039;white', alpha=0.6, density=True)
x_range = np.linspace(salaries.min(), salaries.max(), 200)
kde = stats.gaussian_kde(salaries)
axes[1,0].plot(x_range, kde(x_range), &#039;r-', linewidth=2.5, label='KDE')
axes[1,0].axvline(salaries.mean(), color=&#039;blue', linestyle='--', label=f'Mean: ${salaries.mean():,.0f}')
axes[1,0].axvline(np.median(salaries), color=&#039;green', linestyle='--', label=f'Median: ${np.median(salaries):,.0f}')
axes[1,0].set_title(&#039;Histogram + KDE with Reference Lines')
axes[1,0].legend(fontsize=8)

# 4: Skewed distribution
skewed = np.random.exponential(scale=30000, size=1000) + 20000
axes[1,1].hist(skewed, bins=40, color=&#039;#F44336', edgecolor='white', alpha=0.8, density=True)
x_s = np.linspace(skewed.min(), skewed.max(), 200)
axes[1,1].plot(x_s, stats.gaussian_kde(skewed)(x_s), &#039;b-', linewidth=2.5)
skewness = stats.skew(skewed)
axes[1,1].set_title(f&#039;Right-Skewed Distribution (skew={skewness:.2f})')
axes[1,1].set_xlabel(&#039;Value')

for ax in axes.flatten():
    ax.spines[&#039;top'].set_visible(False)
    ax.spines[&#039;right'].set_visible(False)
    ax.grid(True, alpha=0.3)

plt.suptitle(&#039;Histogram Variations', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig(&#039;histograms.png', dpi=150)
plt.show()

3. Distribution Comparison

python

123456789101112131415161718192021222324

# Compare salary distributions across departments
eng_salaries  = np.random.normal(95000, 18000, 300)
mkt_salaries  = np.random.normal(72000, 12000, 200)
sales_salaries = np.random.normal(65000, 20000, 250)

fig, ax = plt.subplots(figsize=(11, 6))

for salaries, dept, color in [(eng_salaries, &#039;Engineering', '#1565C0'),
                                (mkt_salaries,  &#039;Marketing', '#2E7D32'),
                                (sales_salaries, &#039;Sales', '#E65100')]:
    ax.hist(salaries, bins=30, alpha=0.5, color=color, edgecolor=&#039;none', label=dept)
    ax.axvline(salaries.mean(), color=color, linestyle=&#039;--', linewidth=1.5)

ax.set_title(&#039;Salary Distribution by Department', fontsize=14, fontweight='bold')
ax.set_xlabel(&#039;Annual Salary ($)')
ax.set_ylabel(&#039;Employee Count')
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f&#039;${x/1000:.0f}K'))
ax.legend(title=&#039;Department')
ax.grid(True, alpha=0.3)
ax.spines[&#039;top'].set_visible(False)
ax.spines[&#039;right'].set_visible(False)
plt.tight_layout()
plt.savefig(&#039;distribution_comparison.png', dpi=150)
plt.show()

4. Common Mistakes

Wrong bin count: Too few bins hides the shape; too many creates noisy patterns. Sturges' rule: bins = 1 + log2(n). For n=1000, that's ~10. Freedman-Diaconis adjusts for outliers.

Not using density=True for comparison: When comparing histograms of different sizes, use density=True to normalize to probability density.

5. MCQs

Question 1

Histogram is best for?

Question 2

Too few histogram bins causes?

Question 3

KDE (Kernel Density Estimate) is?

Question 4

`density=True` in histogram?

Question 5

Right-skewed distribution has?

Question 6

`stats.skew()` returns?

Question 7

Bimodal distribution in histogram looks like?

Question 8

For large overlapping datasets, use `alpha=`?

Question 9

Freedman-Diaconis rule for bin count uses?

Question 10

Overlaying multiple histograms to compare distributions requires?

6. Interview Questions

Q: How do you choose the right number of bins for a histogram?

Q: What does a right-skewed distribution tell you about the data?

7. Summary

Histograms reveal distribution shape — normal, skewed, bimodal, uniform. Overlay KDE for smooth approximation. Use density=True when comparing groups of different sizes. Mean vs median divergence indicates skewness. Bin count matters: too few = over-smoothed, too many = noise.

8. Next Chapter Recommendation

In Chapter 10: Box Plots and Statistical Visualization, we visualize quartiles, medians, and outliers — comparing distributions across groups more compactly than histograms.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Histograms and Distribution Analysis #

1. Chapter Introduction #

2. Histogram Fundamentals #

3. Distribution Comparison #

4. Common Mistakes #

5. MCQs #

Histogram is best for?

Too few histogram bins causes?

KDE (Kernel Density Estimate) is?

density=True in histogram?

Right-skewed distribution has?

stats.skew() returns?

Bimodal distribution in histogram looks like?

For large overlapping datasets, use alpha=?

Freedman-Diaconis rule for bin count uses?

Overlaying multiple histograms to compare distributions requires?

6. Interview Questions #

7. Summary #

8. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

🧪 Related Labs 6

🗺️ Related Roadmaps 1

Send Feedback / Bug

Feedback Submitted!

Histograms and Distribution Analysis

1. Chapter Introduction

2. Histogram Fundamentals

3. Distribution Comparison

4. Common Mistakes

5. MCQs

`density=True` in histogram?

`stats.skew()` returns?

For large overlapping datasets, use `alpha=`?

6. Interview Questions

7. Summary

8. Next Chapter Recommendation