Data Visualization with Matplotlib
# CHAPTER 18
Data Visualization with Matplotlib
1. Chapter Introduction
Humans are visual creatures. A stakeholder will not understand a DataFrame containing 10,000 rows of aggregated statistics. You must translate that data into a story they can see. Matplotlib is the foundational data visualization library in Python. This chapter teaches you how to create the four most important charts: Line, Bar, Scatter, and Histogram.2. Matplotlib Basics and Pyplot
Matplotlib is massive, but 95% of the time, data scientists only use a specific module within it called pyplot.
Installation & Importing:
3. The Line Chart (For Trends over Time)
Line charts are perfect for showing time-series data (e.g., stock prices, monthly revenue).
4. The Bar Chart (For Comparing Categories)
Bar charts are used to compare categorical data (e.g., Sales by Region, Population by Country).
5. The Scatter Plot (For Finding Correlations)
Scatter plots are used to visualize the relationship (correlation) between two numerical variables. For example, does spending more on ads increase sales?
*(If the dots form a line going up and to the right, you have a positive correlation!)*
6. The Histogram (For Data Distribution)
Histograms look like bar charts, but they serve a different purpose. They show the *distribution* of a single numerical variable. They tell you where most of your data "clumps" together.
7. Mini Project: Revenue Analytics Dashboard
We can combine multiple charts into one image using plt.subplots().
8. Common Mistakes
-
Forgetting
plt.show(): Depending on your environment, if you don't callplt.show(), the chart might not render, or you'll see ugly memory address text above the image.
- Naked Charts: Creating a chart without a Title, X-label, and Y-label is a cardinal sin in data science. A chart without context is just abstract art.
-
Saving blank images: If you want to export a chart using
plt.savefig('chart.png'), you MUST call it *before*plt.show().plt.show()clears the canvas after displaying it.
9. MCQs
What is the standard alias for importing Matplotlib's plotting module?
Which chart type is best for displaying trends over time (e.g., daily stock price)?
Which chart type is best for finding the correlation between two numerical variables?
Which chart type is best for showing the distribution/frequency of a single variable (e.g., Age demographics)?
How do you add a title to a Matplotlib chart?
What does plt.show() do?
If you want to put two charts side-by-side in a single image, what function do you use?
What parameter in plt.hist() controls how many bars/groups the data is divided into?
How do you add a label to the horizontal axis?
Why must plt.savefig() be called BEFORE plt.show()?
10. Interview Questions
- Q: A stakeholder wants to know if there is a relationship between a customer's Age and their Total Spend. What type of chart would you build and why?
- Q: Explain the difference between a Bar Chart and a Histogram.
11. Summary
Data Visualization is about communication. Useplt.plot() for time-series, plt.bar() for categories, plt.scatter() for correlations, and plt.hist() for distributions. Always dress your charts properly with .title(), .xlabel(), and .ylabel(). Use plt.subplots() to build professional dashboards, and remember to save your figures before showing them.