Skip to main content
Python for Data Science
CHAPTER 19 Beginner

Statistical Visualization with Seaborn

Updated: May 18, 2026
5 min read

# CHAPTER 19

Statistical Visualization with Seaborn

1. Chapter Introduction

Matplotlib is incredibly powerful, but its default charts can look a bit dated, and writing the code to connect it to a Pandas DataFrame can be tedious. Enter Seaborn. Seaborn is a library built *on top* of Matplotlib. It comes with beautiful, modern defaults and is specifically designed to plot Pandas DataFrames with minimal code.

2. Seaborn Basics and Installation

Install it using !pip install seaborn.

python
123456789101112
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load a built-in sample dataset from Seaborn
df = sns.load_dataset('tips')

print(df.head(2))
# Output:
#    total_bill   tip     sex smoker  day    time  size
# 0       16.99  1.01  Female     No  Sun  Dinner     2
# 1       10.34  1.66    Male     No  Sun  Dinner     3

3. Distribution Plots (Histograms with Density)

Seaborn's histplot is an upgrade over Matplotlib's histogram. It can easily add a KDE (Kernel Density Estimate) line to show the smooth distribution of data.

python
123456
# Create a histogram of Total Bills, add a KDE curve
sns.histplot(data=df, x='total_bill', kde=True, color='purple')

# We still use Matplotlib to add the title!
plt.title("Distribution of Total Bills")
plt.show()

4. Categorical Plots (Boxplots)

If you want to compare numerical data across different categories, a Boxplot is the industry standard. It shows the median, the spread (quartiles), and identifies outliers (dots outside the lines).

python
12345
# Compare total bills across different days of the week
# Notice how Seaborn takes the DataFrame directly via the 'data' parameter!
sns.boxplot(data=df, x='day', y='total_bill', palette='Set2')
plt.title("Total Bills by Day")
plt.show()

Instead of a basic scatter plot, Seaborn's lmplot (Linear Model Plot) draws a scatter plot *and* automatically calculates and draws a line of best fit through the data.

python
1234
# Does a higher bill result in a higher tip?
sns.lmplot(data=df, x='total_bill', y='tip', scatter_kws={'alpha':0.5})
plt.title("Bill vs Tip (with Regression Line)")
plt.show()

6. The Pair Plot (The Magic Function)

When you first load a dataset, you want to see how *every* numerical variable correlates with every other variable. Doing this manually takes hours. Seaborn's pairplot does it in one line.

python
123
# Draws scatter plots for every pair of variables, and histograms on the diagonal
sns.pairplot(df, hue='sex') # 'hue' color-codes the dots based on Gender!
plt.show()

7. The Correlation Heatmap

A heatmap uses color to represent the strength of mathematical correlation between variables (from -1 to 1).

python
12345678910
# 1. Calculate the correlation matrix (Pandas function)
# Note: We must select only numeric columns first!
numeric_df = df.select_dtypes(include=['float64', 'int64'])
corr_matrix = numeric_df.corr()

# 2. Plot the Heatmap (Seaborn function)
# annot=True puts the actual numbers inside the colored squares
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title("Correlation Heatmap")
plt.show()

8. Common Mistakes

  • Forgetting plt.show(): Even though you are using Seaborn functions (sns.), Seaborn relies on Matplotlib (plt) under the hood to actually render the image. You must still call plt.show().
  • Passing text to a heatmap: df.corr() will crash if your DataFrame contains text columns (like "Name"). Always filter for numerical columns before running a correlation matrix.

9. MCQs

Question 1

What is the standard alias for importing Seaborn?

Question 2

How is Seaborn related to Matplotlib?

Question 3

Which Seaborn function creates a histogram and can overlay a smooth density curve using kde=True?

Question 4

Which chart type shows the median, quartiles, and outliers of numerical data split by categories?

Question 5

What does sns.lmplot() do?

Question 6

Which Seaborn function creates a massive grid of scatterplots comparing every numerical variable against every other numerical variable?

Question 7

What does the hue='sex' parameter do in a Seaborn plot?

Question 8

What Pandas function must you run *before* you can draw a Seaborn heatmap?

Question 9

What does annot=True do in sns.heatmap()?

Question 10

Do you still need to import matplotlib.pyplot as plt when using Seaborn?

10. Interview Questions

  • Q: Explain what a Correlation Heatmap is. How do you interpret a value of 0.9 versus a value of -0.9?
  • Q: If you receive a dataset with 15 numerical columns and need to quickly find which variables are highly correlated, what Seaborn function would you use?

11. Summary

Seaborn upgrades your visualization capabilities. It integrates directly with Pandas via the data=df parameter. Use sns.histplot() for distributions, sns.boxplot() for categorical comparisons, and sns.lmplot() for instant trend lines. For rapid dataset profiling, rely on sns.pairplot() and correlation heatmaps. Always remember: Seaborn draws the chart, but Matplotlib (plt.show()) renders it.

12. Next Chapter Recommendation

In Chapter 20: Exploratory Data Analysis (EDA), we will combine everything we have learned so far—Pandas, Matplotlib, and Seaborn—to perform a complete, end-to-end investigation of a brand new dataset.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·