Skip to main content
Data Visualization
CHAPTER 14 Beginner

Statistical Visualization with Seaborn

Updated: May 18, 2026
5 min read

# CHAPTER 14

Statistical Visualization with Seaborn

1. Chapter Introduction

Seaborn's statistical visualization capabilities — pair plots, regression overlays, KDE, violin plots — transform EDA from data exploration into data storytelling. This chapter masters Seaborn's most powerful statistical charts.

2. Pair Plot — Comprehensive EDA

python
123456789101112131415
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style='whitegrid', font_scale=1.0)
iris = sns.load_dataset('iris')

# Pair plot: every pair of numeric variables
g = sns.pairplot(iris, hue='species',
                  palette={'setosa': '#E91E63', 'versicolor': '#2196F3', 'virginica': '#4CAF50'},
                  diag_kind='kde',      # KDE on diagonal (distribution)
                  plot_kws={'alpha': 0.6, 's': 50},
                  diag_kws={'fill': True})
g.figure.suptitle('Iris Dataset — All Pairwise Relationships', y=1.02, fontsize=13, fontweight='bold')
plt.savefig('pairplot.png', dpi=150, bbox_inches='tight')
plt.show()

3. Regression Plots

python
123456789101112131415161718192021222324252627282930313233343536373839
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

tips = sns.load_dataset('tips')

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# lmplot: regression with confidence interval
sns.regplot(data=tips, x='total_bill', y='tip', ax=axes[0],
             scatter_kws={'alpha': 0.5}, line_kws={'color': 'red'})
axes[0].set_title('Linear Regression\nBill vs Tip')

# residplot: check model fit
sns.residplot(data=tips, x='total_bill', y='tip', ax=axes[1], lowess=True,
               scatter_kws={'alpha': 0.5}, line_kws={'color': 'orange'})
axes[1].axhline(0, color='gray', linestyle='--')
axes[1].set_title('Residual Plot\n(Good if random around 0)')

# Regression by group
for smoker, color in [('Yes', '#E91E63'), ('No', '#2196F3')]:
    subset = tips[tips['smoker'] == smoker]
    axes[2].scatter(subset['total_bill'], subset['tip'], alpha=0.5, color=color, label=smoker)
    z = np.polyfit(subset['total_bill'], subset['tip'], 1)
    p = np.poly1d(z)
    x_line = np.linspace(subset['total_bill'].min(), subset['total_bill'].max(), 100)
    axes[2].plot(x_line, p(x_line), color=color, linewidth=2)
axes[2].legend(title='Smoker')
axes[2].set_title('Regression by Group\n(Smoker vs Non-smoker)')
axes[2].set_xlabel('Total Bill ($)')
axes[2].set_ylabel('Tip ($)')

for ax in axes:
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig('regression_plots.png', dpi=150)
plt.show()

4. KDE and Distribution Plots

python
12345678910111213141516171819202122232425262728293031323334353637
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
eng  = np.random.normal(95000, 18000, 300)
mkt  = np.random.normal(72000, 12000, 200)
sales = np.random.normal(65000, 20000, 250)

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# KDE comparison
for data, dept, color in [(eng, 'Engineering', '#1565C0'),
                            (mkt, 'Marketing', '#2E7D32'),
                            (sales, 'Sales', '#E65100')]:
    sns.kdeplot(data=data, ax=axes[0], label=dept, color=color, fill=True, alpha=0.3)
axes[0].set_title('KDE Salary by Department')
axes[0].legend()
axes[0].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x/1000:.0f}K'))

# ECDF — cumulative distribution
import pandas as pd
salary_df = pd.DataFrame({'Salary': np.concatenate([eng, mkt, sales]),
                           'Dept': ['Eng']*300 + ['Mkt']*200 + ['Sales']*250})
sns.ecdfplot(data=salary_df, x='Salary', hue='Dept', ax=axes[1])
axes[1].set_title('ECDF — Cumulative Distribution')
axes[1].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x/1000:.0f}K'))

# displot style (histogram + KDE)
sns.histplot(data=salary_df, x='Salary', hue='Dept', kde=True,
              alpha=0.4, bins=30, ax=axes[2])
axes[2].set_title('Histogram + KDE by Department')
axes[2].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x/1000:.0f}K'))

plt.tight_layout()
plt.savefig('kde_plots.png', dpi=150)
plt.show()

5. Common Mistakes

  • sns.lmplot() vs sns.regplot(): lmplot creates its own figure (can't pass ax=). regplot works with existing axes. Use regplot inside subplot layouts.
  • Pairplot with too many variables: More than 5-6 columns makes pairplot unreadable. Select key features before plotting.

6. MCQs

Question 1

sns.pairplot(iris, hue='species') creates?

Question 2

diagkind='kde' in pairplot?

Question 3

sns.regplot() differs from lmplot() by?

Question 4

Residual plot is used to?

Question 5

ECDF shows?

Question 6

fill=True in kdeplot?

Question 7

sns.histplot(kde=True) combines?

Question 8

sns.ecdfplot() Y-axis range?

Question 9

Recommended max variables for pairplot?

Question 10

scatterkws={'alpha': 0.5} in regplot?

7. Interview Questions

  • Q: What does a pair plot tell you during EDA?
  • Q: How do you interpret a residual plot?

8. Summary

Seaborn's statistical arsenal: pairplot for all-variable EDA overview, regplot for regression overlay, residplot for model diagnostics, kdeplot for smooth distribution comparison, ecdfplot for cumulative distribution. These 5 charts cover 80% of statistical EDA needs in professional data science workflows.

9. Next Chapter Recommendation

In Chapter 15: Heatmaps and Correlation Matrices, we visualize relationships between many variables simultaneously using color-encoded matrix charts.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·