Skip to main content
Python for Data Science
CHAPTER 18 Beginner

Data Visualization with Matplotlib

Updated: May 18, 2026
5 min read

# CHAPTER 18

Data Visualization with Matplotlib

1. Chapter Introduction

Humans are visual creatures. A stakeholder will not understand a DataFrame containing 10,000 rows of aggregated statistics. You must translate that data into a story they can see. Matplotlib is the foundational data visualization library in Python. This chapter teaches you how to create the four most important charts: Line, Bar, Scatter, and Histogram.

2. Matplotlib Basics and Pyplot

Matplotlib is massive, but 95% of the time, data scientists only use a specific module within it called pyplot.

Installation & Importing:

bash
1
!pip install matplotlib
python
12
# 'plt' is the standard alias
import matplotlib.pyplot as plt

Line charts are perfect for showing time-series data (e.g., stock prices, monthly revenue).

python
1234567891011121314151617
import matplotlib.pyplot as plt

# Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
revenue = [5000, 5200, 4800, 6000, 6500]

# 1. Create the plot
plt.plot(months, revenue, color='blue', marker='o') # 'o' adds dots at data points

# 2. Add labels and title (CRITICAL for good charts!)
plt.title("Monthly Revenue 2023")
plt.xlabel("Month")
plt.ylabel("Revenue in USD")
plt.grid(True) # Adds a background grid

# 3. Display the chart
plt.show()

4. The Bar Chart (For Comparing Categories)

Bar charts are used to compare categorical data (e.g., Sales by Region, Population by Country).

python
12345678
regions = ['North', 'South', 'East', 'West']
sales = [350, 420, 200, 500]

plt.bar(regions, sales, color=['red', 'blue', 'green', 'orange'])
plt.title("Sales by Region")
plt.xlabel("Region")
plt.ylabel("Units Sold")
plt.show()

5. The Scatter Plot (For Finding Correlations)

Scatter plots are used to visualize the relationship (correlation) between two numerical variables. For example, does spending more on ads increase sales?

python
12345678
ad_spend = [100, 200, 300, 400, 500]
sales_generated = [150, 300, 280, 550, 600]

plt.scatter(ad_spend, sales_generated, color='purple')
plt.title("Ad Spend vs Sales")
plt.xlabel("Ad Spend ($)")
plt.ylabel("Sales Generated ($)")
plt.show()

*(If the dots form a line going up and to the right, you have a positive correlation!)*

6. The Histogram (For Data Distribution)

Histograms look like bar charts, but they serve a different purpose. They show the *distribution* of a single numerical variable. They tell you where most of your data "clumps" together.

python
1234567891011
import numpy as np

# Generate 1000 random test scores centered around 75
scores = np.random.normal(75, 10, 1000)

# bins=20 means divide the data into 20 groups (bars)
plt.hist(scores, bins=20, color='skyblue', edgecolor='black')
plt.title("Distribution of Test Scores")
plt.xlabel("Score")
plt.ylabel("Frequency (Number of Students)")
plt.show()

7. Mini Project: Revenue Analytics Dashboard

We can combine multiple charts into one image using plt.subplots().

python
12345678910111213141516
# Create a figure with 1 row, 2 columns of charts. Size is 12x5 inches.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Chart 1: Line Chart on the left (ax1)
ax1.plot(months, revenue, marker='o', color='green')
ax1.set_title("Revenue Trend")
ax1.set_ylabel("USD")

# Chart 2: Bar Chart on the right (ax2)
ax2.bar(regions, sales, color='teal')
ax2.set_title("Regional Sales")
ax2.set_ylabel("Units")

# Adjust spacing so they don't overlap, then show
plt.tight_layout()
plt.show()

8. Common Mistakes

  • Forgetting plt.show(): Depending on your environment, if you don't call plt.show(), the chart might not render, or you'll see ugly memory address text above the image.
  • Naked Charts: Creating a chart without a Title, X-label, and Y-label is a cardinal sin in data science. A chart without context is just abstract art.
  • Saving blank images: If you want to export a chart using plt.savefig('chart.png'), you MUST call it *before* plt.show(). plt.show() clears the canvas after displaying it.

9. MCQs

Question 1

What is the standard alias for importing Matplotlib's plotting module?

Question 2

Which chart type is best for displaying trends over time (e.g., daily stock price)?

Question 3

Which chart type is best for finding the correlation between two numerical variables?

Question 4

Which chart type is best for showing the distribution/frequency of a single variable (e.g., Age demographics)?

Question 5

How do you add a title to a Matplotlib chart?

Question 6

What does plt.show() do?

Question 7

If you want to put two charts side-by-side in a single image, what function do you use?

Question 8

What parameter in plt.hist() controls how many bars/groups the data is divided into?

Question 9

How do you add a label to the horizontal axis?

Question 10

Why must plt.savefig() be called BEFORE plt.show()?

10. Interview Questions

  • Q: A stakeholder wants to know if there is a relationship between a customer's Age and their Total Spend. What type of chart would you build and why?
  • Q: Explain the difference between a Bar Chart and a Histogram.

11. Summary

Data Visualization is about communication. Use plt.plot() for time-series, plt.bar() for categories, plt.scatter() for correlations, and plt.hist() for distributions. Always dress your charts properly with .title(), .xlabel(), and .ylabel(). Use plt.subplots() to build professional dashboards, and remember to save your figures before showing them.

12. Next Chapter Recommendation

In Chapter 19: Statistical Visualization with Seaborn, we will explore a library built on top of Matplotlib that makes charts look significantly more modern with much less code, and integrates perfectly with Pandas.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·