CHAPTER 14 Beginner

Statistical Visualization with Seaborn

Updated: May 18, 2026

5 min read

# CHAPTER 14

Statistical Visualization with Seaborn

1. Chapter Introduction

Seaborn's statistical visualization capabilities — pair plots, regression overlays, KDE, violin plots — transform EDA from data exploration into data storytelling. This chapter masters Seaborn's most powerful statistical charts.

2. Pair Plot — Comprehensive EDA

python

123456789101112131415

import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style=&#039;whitegrid', font_scale=1.0)
iris = sns.load_dataset(&#039;iris')

# Pair plot: every pair of numeric variables
g = sns.pairplot(iris, hue=&#039;species',
                  palette={&#039;setosa': '#E91E63', 'versicolor': '#2196F3', 'virginica': '#4CAF50'},
                  diag_kind=&#039;kde',      # KDE on diagonal (distribution)
                  plot_kws={&#039;alpha': 0.6, 's': 50},
                  diag_kws={&#039;fill': True})
g.figure.suptitle(&#039;Iris Dataset — All Pairwise Relationships', y=1.02, fontsize=13, fontweight='bold')
plt.savefig(&#039;pairplot.png', dpi=150, bbox_inches='tight')
plt.show()

3. Regression Plots

python

123456789101112131415161718192021222324252627282930313233343536373839

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

tips = sns.load_dataset(&#039;tips')

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# lmplot: regression with confidence interval
sns.regplot(data=tips, x=&#039;total_bill', y='tip', ax=axes[0],
             scatter_kws={&#039;alpha': 0.5}, line_kws={'color': 'red'})
axes[0].set_title(&#039;Linear Regression\nBill vs Tip')

# residplot: check model fit
sns.residplot(data=tips, x=&#039;total_bill', y='tip', ax=axes[1], lowess=True,
               scatter_kws={&#039;alpha': 0.5}, line_kws={'color': 'orange'})
axes[1].axhline(0, color=&#039;gray', linestyle='--')
axes[1].set_title(&#039;Residual Plot\n(Good if random around 0)')

# Regression by group
for smoker, color in [(&#039;Yes', '#E91E63'), ('No', '#2196F3')]:
    subset = tips[tips[&#039;smoker'] == smoker]
    axes[2].scatter(subset[&#039;total_bill'], subset['tip'], alpha=0.5, color=color, label=smoker)
    z = np.polyfit(subset[&#039;total_bill'], subset['tip'], 1)
    p = np.poly1d(z)
    x_line = np.linspace(subset[&#039;total_bill'].min(), subset['total_bill'].max(), 100)
    axes[2].plot(x_line, p(x_line), color=color, linewidth=2)
axes[2].legend(title=&#039;Smoker')
axes[2].set_title(&#039;Regression by Group\n(Smoker vs Non-smoker)')
axes[2].set_xlabel(&#039;Total Bill ($)')
axes[2].set_ylabel(&#039;Tip ($)')

for ax in axes:
    ax.spines[&#039;top'].set_visible(False)
    ax.spines[&#039;right'].set_visible(False)

plt.tight_layout()
plt.savefig(&#039;regression_plots.png', dpi=150)
plt.show()

4. KDE and Distribution Plots

python

12345678910111213141516171819202122232425262728293031323334353637

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
eng  = np.random.normal(95000, 18000, 300)
mkt  = np.random.normal(72000, 12000, 200)
sales = np.random.normal(65000, 20000, 250)

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# KDE comparison
for data, dept, color in [(eng, &#039;Engineering', '#1565C0'),
                            (mkt, &#039;Marketing', '#2E7D32'),
                            (sales, &#039;Sales', '#E65100')]:
    sns.kdeplot(data=data, ax=axes[0], label=dept, color=color, fill=True, alpha=0.3)
axes[0].set_title(&#039;KDE Salary by Department')
axes[0].legend()
axes[0].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f&#039;${x/1000:.0f}K'))

# ECDF — cumulative distribution
import pandas as pd
salary_df = pd.DataFrame({&#039;Salary': np.concatenate([eng, mkt, sales]),
                           &#039;Dept': ['Eng']*300 + ['Mkt']*200 + ['Sales']*250})
sns.ecdfplot(data=salary_df, x=&#039;Salary', hue='Dept', ax=axes[1])
axes[1].set_title(&#039;ECDF — Cumulative Distribution')
axes[1].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f&#039;${x/1000:.0f}K'))

# displot style (histogram + KDE)
sns.histplot(data=salary_df, x=&#039;Salary', hue='Dept', kde=True,
              alpha=0.4, bins=30, ax=axes[2])
axes[2].set_title(&#039;Histogram + KDE by Department')
axes[2].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f&#039;${x/1000:.0f}K'))

plt.tight_layout()
plt.savefig(&#039;kde_plots.png', dpi=150)
plt.show()

5. Common Mistakes

sns.lmplot() vs sns.regplot(): lmplot creates its own figure (can't pass ax=). regplot works with existing axes. Use regplot inside subplot layouts.

Pairplot with too many variables: More than 5-6 columns makes pairplot unreadable. Select key features before plotting.

6. MCQs

Question 1

`sns.pairplot(iris, hue='species')` creates?

Question 2

`diagkind='kde'` in pairplot?

Question 3

sns.regplot() differs from lmplot() by?

Question 4

Residual plot is used to?

Question 5

ECDF shows?

Question 6

fill=True in kdeplot?

Question 7

sns.histplot(kde=True) combines?

Question 8

sns.ecdfplot() Y-axis range?

Question 9

Recommended max variables for pairplot?

Question 10

`scatter``kws={'alpha': 0.5}` in regplot?

7. Interview Questions

Q: What does a pair plot tell you during EDA?

Q: How do you interpret a residual plot?

8. Summary

Seaborn's statistical arsenal: pairplot for all-variable EDA overview, regplot for regression overlay, residplot for model diagnostics, kdeplot for smooth distribution comparison, ecdfplot for cumulative distribution. These 5 charts cover 80% of statistical EDA needs in professional data science workflows.

9. Next Chapter Recommendation

In Chapter 15: Heatmaps and Correlation Matrices, we visualize relationships between many variables simultaneously using color-encoded matrix charts.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Statistical Visualization with Seaborn #

1. Chapter Introduction #

2. Pair Plot — Comprehensive EDA #

3. Regression Plots #

4. KDE and Distribution Plots #

5. Common Mistakes #

6. MCQs #

sns.pairplot(iris, hue='species') creates?

diagkind='kde' in pairplot?

sns.regplot() differs from lmplot() by?

Residual plot is used to?

ECDF shows?

fill=True in kdeplot?

sns.histplot(kde=True) combines?

sns.ecdfplot() Y-axis range?

Recommended max variables for pairplot?

scatterkws={'alpha': 0.5} in regplot?

7. Interview Questions #

8. Summary #

9. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

🧪 Related Labs 6

🗺️ Related Roadmaps 1

Send Feedback / Bug

Feedback Submitted!

Statistical Visualization with Seaborn

1. Chapter Introduction

2. Pair Plot — Comprehensive EDA

3. Regression Plots

4. KDE and Distribution Plots

5. Common Mistakes

6. MCQs

`sns.pairplot(iris, hue='species')` creates?

`diagkind='kde'` in pairplot?

`sns.regplot()` differs from `lmplot()` by?

`fill=True` in kdeplot?

`sns.histplot(kde=True)` combines?

`sns.ecdfplot()` Y-axis range?

`scatter``kws={'alpha': 0.5}` in regplot?

7. Interview Questions

8. Summary

9. Next Chapter Recommendation