Skip to main content
Data Visualization
CHAPTER 15 Beginner

Heatmaps and Correlation Matrices

Updated: May 18, 2026
5 min read

# CHAPTER 15

Heatmaps and Correlation Matrices

1. Chapter Introduction

Heatmaps encode numeric values as colors in a 2D matrix — perfect for correlation analysis, cross-tabulations, and time-based pattern detection. A single heatmap can reveal patterns across dozens of variables simultaneously.

2. Correlation Heatmap

python
123456789101112131415161718192021222324252627282930313233343536373839404142434445
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

np.random.seed(42)
df = pd.DataFrame({
    'Revenue':     np.random.normal(75000, 15000, 200),
    'Ad_Spend':    np.random.normal(12000, 3000, 200),
    'Customers':   np.random.randint(100, 500, 200),
    'Satisfaction':np.random.uniform(3, 5, 200),
    'Employees':   np.random.randint(10, 100, 200),
    'Profit':      np.random.normal(20000, 5000, 200)
})
df['Revenue'] = df['Revenue'] + df['Ad_Spend'] * 3 + df['Customers'] * 50

corr = df.corr()

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Full correlation matrix
sns.heatmap(corr, ax=axes[0],
             annot=True, fmt='.2f',       # Show values in cells
             cmap='RdBu_r',               # Diverging: red=negative, blue=positive
             vmin=-1, vmax=1,             # Fix scale to [-1, 1]
             center=0,                    # White = 0 correlation
             square=True,                 # Square cells
             linewidths=0.5,              # Cell borders
             linecolor='white',
             cbar_kws={'shrink': 0.8})
axes[0].set_title('Correlation Matrix', fontsize=13, fontweight='bold')

# Upper triangle only (no redundancy)
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, ax=axes[1],
             mask=mask,                   # Hide upper triangle
             annot=True, fmt='.2f',
             cmap='RdBu_r', vmin=-1, vmax=1, center=0,
             square=True, linewidths=0.5, linecolor='white')
axes[1].set_title('Correlation Matrix (Lower Triangle)', fontsize=13, fontweight='bold')

plt.suptitle('Business Metrics Correlation Analysis', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('correlation_heatmap.png', dpi=150)
plt.show()

3. Cross-Tabulation Heatmap

python
123456789101112131415161718192021
# Sales heatmap: month × product
months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
products = ['Laptop', 'Phone', 'Monitor', 'Desk', 'Chair']
np.random.seed(42)
sales_matrix = np.random.randint(10, 120, size=(12, 5))
# Add seasonal pattern
sales_matrix[10:, :] = (sales_matrix[10:, :] * 1.8).astype(int)  # Nov-Dec spike
df_sales = pd.DataFrame(sales_matrix, index=months, columns=products)

fig, ax = plt.subplots(figsize=(12, 8))
sns.heatmap(df_sales, ax=ax,
             annot=True, fmt='d',
             cmap='YlOrRd',       # Sequential — more = darker
             linewidths=0.5, linecolor='white',
             cbar_kws={'label': 'Units Sold'})
ax.set_title('Monthly Sales by Product — Units Sold', fontsize=14, fontweight='bold')
ax.set_xlabel('Product', fontsize=12)
ax.set_ylabel('Month', fontsize=12)
plt.tight_layout()
plt.savefig('sales_heatmap.png', dpi=150)
plt.show()

4. Mini Project: Employee Analytics Heatmap

python
12345678910111213141516171819202122232425262728293031323334353637383940414243
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

np.random.seed(42)
depts = ['Engineering', 'Marketing', 'Sales', 'HR', 'Finance']
metrics = ['Satisfaction', 'Performance', 'Retention', 'Training_Hours', 'Absenteeism']

data = {
    'Engineering': [4.2, 4.5, 92, 40, 3],
    'Marketing':   [3.8, 3.9, 88, 30, 5],
    'Sales':       [3.5, 4.1, 78, 25, 8],
    'HR':          [4.0, 3.7, 90, 35, 4],
    'Finance':     [3.9, 4.0, 94, 32, 3]
}
df_hr = pd.DataFrame(data, index=metrics)

# Normalize each metric to 0-1 for fair comparison
df_norm = df_hr.copy().astype(float)
for metric in metrics:
    row = df_norm.loc[metric]
    if metric == 'Absenteeism':  # Lower is better
        df_norm.loc[metric] = 1 - (row - row.min()) / (row.max() - row.min())
    else:  # Higher is better
        df_norm.loc[metric] = (row - row.min()) / (row.max() - row.min())

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Raw values
sns.heatmap(df_hr, ax=ax1, annot=True, fmt='.1f', cmap='RdYlGn',
             linewidths=0.5, linecolor='white', cbar_kws={'shrink': 0.8})
ax1.set_title('HR Metrics by Department\n(Raw Values)', fontsize=12, fontweight='bold')

# Normalized (performance score)
sns.heatmap(df_norm, ax=ax2, annot=True, fmt='.2f', cmap='RdYlGn',
             vmin=0, vmax=1, linewidths=0.5, linecolor='white', cbar_kws={'shrink': 0.8})
ax2.set_title('Normalized Performance Score\n(0=worst, 1=best)', fontsize=12, fontweight='bold')

plt.suptitle('Employee Analytics Dashboard', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('hr_heatmap.png', dpi=150)
plt.show()

5. Common Mistakes

  • Using sequential colormap for correlation: Correlation ranges -1 to +1. Use diverging (RdBu) so zero = white/neutral, not the minimum.
  • Not masking the upper triangle: Full symmetric correlation matrix shows every value twice — use mask=np.triu(...) for cleaner lower-triangle version.

6. MCQs

Question 1

Heatmap encodes values using?

Question 2

cmap='RdBur' for correlation is ideal because?

Question 3

vmin=-1, vmax=1 in correlation heatmap?

Question 4

mask=np.triu(...) hides?

Question 5

annot=True, fmt='.2f' shows?

Question 6

center=0 in heatmap with RdBu?

Question 7

Sequential colormap (YlOrRd) is best for?

Question 8

linewidths=0.5, linecolor='white' adds?

Question 9

Normalizing HR metrics to 0-1 enables?

Question 10

Heatmap is best for?

7. Interview Questions

  • Q: Why should you use a diverging colormap for a correlation heatmap?
  • Q: How do you normalize metrics for comparison in a heatmap?

8. Summary

Heatmaps encode matrix data as color — perfect for correlation matrices, cross-tabulations, and calendar heatmaps. Use RdBu
r (diverging) for correlations, YlOrRd (sequential) for counts/sales. Always mask the upper triangle in symmetric correlation matrices. Normalize heterogeneous metrics before comparative heatmaps.

9. Next Chapter Recommendation

In Chapter 16: Time Series Visualization, we master financial charting, moving averages, seasonal decomposition, and stock market analysis.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·