Skip to main content
R Programming
CHAPTER 16 Beginner

Data Visualization with ggplot2

Updated: May 18, 2026
5 min read

# CHAPTER 16

Data Visualization with ggplot2

1. Chapter Introduction

ggplot2's Grammar of Graphics revolutionized data visualization — it defines charts as layers of data, aesthetics, and geometries. Once understood, any chart becomes buildable in a consistent, logical way.

2. Grammar of Graphics Fundamentals

r
1234567891011121314151617181920212223242526272829
library(ggplot2)
library(dplyr)

# ggplot2 FORMULA:
# ggplot(data, aes(x, y, color, fill, size)) +
#   geom_*() +
#   scale_*() +
#   labs() +
#   theme_*()

# Aesthetics (aes) map data to visual properties:
#   x, y       → position
#   color      → outline/line color
#   fill       → area fill color
#   size       → point/line size
#   shape      → point shape
#   alpha      → transparency (0=transparent, 1=opaque)
#   linetype   → solid, dashed, dotted

# Sample data
set.seed(42)
sales_df <- data.frame(
  month   = rep(month.abb, 2),
  year    = rep(c(2023, 2024), each=12),
  revenue = c(round(runif(12, 40000, 90000), -3),
              round(runif(12, 50000, 100000), -3)),
  region  = rep(c("North","South","East","West","North","North",
                   "South","East","West","North","East","South"), 2)
)

3. Essential Chart Types

r
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
# ─── SCATTER PLOT ────────────────────────────────────
ggplot(mtcars, aes(x=hp, y=mpg, color=factor(cyl), size=wt)) +
  geom_point(alpha=0.7) +
  geom_smooth(method="lm", se=TRUE, aes(group=1), color="black", linetype="dashed") +
  scale_color_manual(values=c("4"="#1565C0","6"="#E65100","8"="#B71C1C"),
                     name="Cylinders") +
  labs(title="Horsepower vs Fuel Efficiency",
       subtitle="Color = cylinders, Size = weight",
       x="Horsepower", y="Miles per Gallon",
       caption="Source: mtcars dataset") +
  theme_bw() +
  theme(plot.title=element_text(size=14, face="bold"),
        legend.position="bottom")

# ─── BAR CHART ───────────────────────────────────────
region_revenue <- sales_df %>%
  group_by(region) %>%
  summarise(total=sum(revenue), .groups="drop") %>%
  arrange(desc(total))

ggplot(region_revenue, aes(x=reorder(region, total), y=total, fill=region)) +
  geom_col(show.legend=FALSE, width=0.7) +
  geom_text(aes(label=paste0("$", formatC(total/1000, format="f", digits=0),"K")),
             hjust=-0.1, fontface="bold") +
  coord_flip() +
  scale_fill_manual(values=c("#1565C0","#2E7D32","#E65100","#6A1B9A")) +
  scale_y_continuous(limits=c(0, max(region_revenue$total)*1.15),
                     labels=function(x) paste0("$",x/1000,"K")) +
  labs(title="Total Revenue by Region", x=NULL, y="Revenue") +
  theme_minimal(base_size=12) +
  theme(panel.grid.major.y=element_blank(),
        plot.title=element_text(face="bold"))

# ─── LINE CHART ──────────────────────────────────────
monthly_rev <- sales_df %>%
  mutate(month_order = match(month, month.abb)) %>%
  arrange(year, month_order)

ggplot(monthly_rev, aes(x=month_order, y=revenue, color=factor(year), group=year)) +
  geom_line(size=1.5) +
  geom_point(size=3) +
  scale_x_continuous(breaks=1:12, labels=month.abb) +
  scale_y_continuous(labels=function(x) paste0("$",x/1000,"K")) +
  scale_color_manual(values=c("2023"="#90CAF9","2024"="#1565C0"), name="Year") +
  labs(title="Monthly Revenue: 2023 vs 2024", x="Month", y="Revenue") +
  theme_minimal() + theme(legend.position="top")

# ─── HISTOGRAM ───────────────────────────────────────
ggplot(mtcars, aes(x=mpg)) +
  geom_histogram(bins=15, fill="#1565C0", color="white", alpha=0.8) +
  geom_vline(xintercept=mean(mtcars$mpg), color="red", linetype="dashed", size=1) +
  labs(title="Distribution of Fuel Efficiency", x="MPG", y="Count") +
  theme_minimal()

# ─── BOX PLOT ────────────────────────────────────────
ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(cyl))) +
  geom_boxplot(alpha=0.7, outlier.color="red") +
  geom_jitter(width=0.1, alpha=0.4) +
  scale_fill_manual(values=c("#42A5F5","#1565C0","#0D47A1")) +
  labs(title="Fuel Efficiency by Number of Cylinders",
       x="Cylinders", y="Miles per Gallon") +
  theme_minimal() + theme(legend.position="none")

4. Faceting and Themes

r
123456789101112131415161718192021222324
# Facet wrap — small multiples
ggplot(sales_df, aes(x=match(month, month.abb), y=revenue)) +
  geom_col(fill="#1565C0", alpha=0.8) +
  facet_wrap(~region, nrow=2) +
  scale_x_continuous(breaks=1:12, labels=month.abb) +
  labs(title="Monthly Revenue by Region", x="Month", y="Revenue") +
  theme_minimal() +
  theme(axis.text.x=element_text(angle=45, hjust=1))

# Custom theme function
my_theme <- theme_minimal(base_size=12) +
  theme(
    plot.title       = element_text(face="bold", size=14, color="#1565C0"),
    plot.subtitle    = element_text(color="#666666"),
    axis.title       = element_text(face="bold"),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color="#EEEEEE"),
    strip.background = element_rect(fill="#F5F5F5"),
    strip.text       = element_text(face="bold")
  )

# Save plot
ggsave("revenue_chart.png", width=10, height=6, dpi=300)
ggsave("revenue_chart.pdf", width=10, height=6)  # Vector format

5. Common Mistakes

  • aes(color="blue") vs color="blue" outside aes(): Inside aes(), color="blue" maps a variable named "blue" (wrong). Set fixed colors outside aes(): geompoint(color="blue").
  • geombar() vs geomcol(): geombar() counts rows (like stat="count"). geomcol() uses y-values directly. Use geomcol() when your data already has the heights.

6. MCQs

Question 1

In ggplot2, aes() maps?

Question 2

geomcol() vs geombar()?

Question 3

facetwrap(~region) creates?

Question 4

reorder(x, y) in aes() sorts?

Question 5

alpha=0.5 in geompoint()?

Question 6

thememinimal() provides?

Question 7

ggsave("plot.png", dpi=300) creates?

Question 8

scaleycontinuous(labels=scales::comma) formats y-axis as?

Question 9

geomsmooth(method="lm") adds?

Question 10

coordflip() in bar chart?

7. Interview Questions

  • Q: What is the Grammar of Graphics and how does ggplot2 implement it?
  • Q: When would you use geomcol() vs geombar()?

8. Summary

ggplot2 Grammar: data + aesthetics (aes()) + geometry (geom
*()) + scales + theme. Core geometries: geompoint(), geomcol(), geomline(), geomhistogram(), geomboxplot(). Colors: inside aes() for mapped colors, outside for fixed. facetwrap() for small multiples. theme_minimal() for clean defaults. ggsave() for high-res export.

9. Next Chapter Recommendation

In Chapter 17: Statistical Analysis in R, we compute descriptive statistics, probability, and build professional statistical reports.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·