CHAPTER 10 Beginner

Lists and Data Frames

Updated: May 18, 2026

5 min read

# CHAPTER 10

Lists and Data Frames in R

1. Chapter Introduction

Lists hold mixed types; data frames hold tabular data. Together they're the foundation of all real-world R data analysis. This chapter masters both structures and builds an employee records management system.

2. Lists

1234567891011121314151617181920212223242526272829303132

# Lists can hold ANY type — including other lists
employee <- list(
  name       = "Alice Johnson",
  age        = 32,
  salary     = 85000.50,
  skills     = c("R", "Python", "SQL", "Tableau"),
  is_active  = TRUE,
  address    = list(city="New York", state="NY", zip="10001")
)

# Accessing list elements
employee$name           # "Alice Johnson"
employee[["salary"]]    # 85000.5
employee[[4]]           # c("R", "Python", "SQL", "Tableau")
employee$skills[1]      # "R" (first skill)
employee$address$city   # "New York" (nested list)

# Modify list
employee$salary <- 92000        # Update existing
employee$department <- "Data"   # Add new field
employee$skills <- c(employee$skills, "Shiny")  # Append

# List operations
length(employee)     # Number of top-level elements
names(employee)      # Names of all elements
str(employee)        # Compact structure display

# lapply/sapply on list
dept_sizes <- list(IT=45, HR=12, Finance=28, Marketing=19, Sales=65)
total <- sum(unlist(dept_sizes))
percentages <- sapply(dept_sizes, function(x) round(x/total * 100, 1))
print(percentages)

3. Data Frames

1234567891011121314151617181920212223242526272829

# Data frame: table with named columns of potentially different types
employees <- data.frame(
  id         = 1:6,
  name       = c("Alice", "Bob", "Carol", "David", "Eve", "Frank"),
  dept       = c("IT", "HR", "IT", "Finance", "IT", "HR"),
  salary     = c(85000, 55000, 78000, 92000, 95000, 52000),
  years_exp  = c(5, 2, 3, 8, 10, 1),
  is_senior  = c(TRUE, FALSE, FALSE, TRUE, TRUE, FALSE),
  stringsAsFactors = FALSE
)

# Data frame properties
dim(employees)      # 6 7
nrow(employees)     # 6
ncol(employees)     # 7
names(employees)    # column names
str(employees)      # structure overview
summary(employees)  # statistical summary of all columns

# Accessing columns
employees$name          # Column as vector
employees[, "salary"]   # Same
employees[, 4]          # By position

# Accessing rows
employees[1, ]          # First row
employees[2:4, ]        # Rows 2-4
employees[employees$dept == "IT", ]   # Filter: IT dept only
employees[employees$salary > 80000, c("name", "salary")]  # Filter + select

4. Data Frame Manipulation

1234567891011121314151617181920212223242526

# Add new column
employees$bonus <- employees$salary * 0.10
employees$total_comp <- employees$salary + employees$bonus

# Add new row
new_emp <- data.frame(
  id=7, name="Grace", dept="IT", salary=88000,
  years_exp=4, is_senior=FALSE, bonus=8800, total_comp=96800
)
employees <- rbind(employees, new_emp)

# Modify values
employees$salary[employees$name == "Bob"] <- 58000  # Update Bob's salary

# Remove column
employees$is_senior <- NULL

# Rename column
names(employees)[names(employees) == "years_exp"] <- "experience"

# Sort data frame
employees <- employees[order(employees$salary, decreasing=TRUE), ]

# Subset (base R)
it_dept <- subset(employees, dept == "IT" & salary > 75000,
                   select = c(name, salary, experience))

5. Mini Project: Employee Records System

12345678910111213141516171819202122232425262728293031323334353637383940414243

# ─── EMPLOYEE RECORDS SYSTEM ─────────────────────────
set.seed(123)
depts <- c("Engineering", "Marketing", "Sales", "HR", "Finance")
n <- 30

emp_db <- data.frame(
  id       = 1:n,
  name     = paste0("Employee_", sprintf("%02d", 1:n)),
  dept     = sample(depts, n, replace=TRUE),
  salary   = round(runif(n, 40000, 120000), -3),
  rating   = round(runif(n, 1, 5), 1),
  tenure   = sample(1:15, n, replace=TRUE)
)

emp_db$grade <- ifelse(emp_db$rating >= 4.5, "S",
                 ifelse(emp_db$rating >= 3.5, "A",
                 ifelse(emp_db$rating >= 2.5, "B", "C")))
emp_db$bonus <- round(emp_db$salary * ifelse(emp_db$grade == "S", 0.20,
                                       ifelse(emp_db$grade == "A", 0.15,
                                       ifelse(emp_db$grade == "B", 0.10, 0.05))), -2)

cat("=== EMPLOYEE RECORDS SYSTEM ===\n\n")
cat("Total Employees:", nrow(emp_db), "\n")
cat("Total Payroll:  $", format(sum(emp_db$salary), big.mark=","), "\n")
cat("Avg Salary:     $", format(round(mean(emp_db$salary)), big.mark=","), "\n\n")

# Department summary
dept_summary <- aggregate(
  cbind(salary, bonus, rating) ~ dept,
  data = emp_db,
  FUN  = function(x) round(mean(x), 2)
)
cat("Department Summary:\n")
print(dept_summary)

# Top performers
top5 <- head(emp_db[order(-emp_db$rating), c("name","dept","salary","rating","grade")], 5)
cat("\nTop 5 Performers:\n")
print(top5)

# Grade distribution
cat("\nGrade Distribution:\n")
print(table(emp_db$grade))

6. Common Mistakes

employee[4] vs employee[[4]]: [4] returns a list with one element. [[4]] returns the element itself. Use [[]] to extract content.

stringsAsFactors default: In older R (<4.0), character columns were automatically converted to factors. In R 4.0+, default is FALSE. Be explicit when needed.

7. MCQs

Question 1

`df$column` vs `df[["column"]]`?

Question 2

`list[[2]]` vs `list[2]`?

Question 3

`str(df)` shows?

Question 4

`rbind()` combines data frames?

Question 5

`aggregate(salary ~ dept, data, mean)` computes?

Question 6

`df$col <- NULL` does?

Question 7

`subset(df, cond, select=cols)` is equivalent to?

Question 8

`names(df)[3] <- "new_name"` renames?

Question 9

`summary(df)` on numeric columns shows?

Question 10

Ordered data frame with `order(-salary)`?

8. Interview Questions

Q: What is the difference between a list and a data frame in R?

Q: How do you filter and select specific columns from a data frame?

9. Summary

Lists: ordered collections of any type, accessed with $name or [[i]]. Data frames: rectangular tables with named columns — R's primary data analysis structure. Key operations: subset(), aggregate(), rbind(), cbind(), order(). str() for structure overview, summary() for statistics. df$col <- NULL removes columns. In R 4.0+, strings are not auto-converted to factors.

10. Next Chapter Recommendation

In Chapter 11: Working with Strings in R, we master string manipulation using both base R and the stringr package from the tidyverse.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Lists and Data Frames in R #

1. Chapter Introduction #

2. Lists #

3. Data Frames #

4. Data Frame Manipulation #

5. Mini Project: Employee Records System #

6. Common Mistakes #

7. MCQs #

df$column vs df[["column"]]?

list[[2]] vs list[2]?

str(df) shows?

rbind() combines data frames?

aggregate(salary ~ dept, data, mean) computes?

df$col <- NULL does?

subset(df, cond, select=cols) is equivalent to?

names(df)[3] <- "new_name" renames?

summary(df) on numeric columns shows?

Ordered data frame with order(-salary)?

8. Interview Questions #

9. Summary #

10. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

Send Feedback / Bug

Feedback Submitted!