Skip to main content
R Programming
CHAPTER 10 Beginner

Lists and Data Frames

Updated: May 18, 2026
5 min read

# CHAPTER 10

Lists and Data Frames in R

1. Chapter Introduction

Lists hold mixed types; data frames hold tabular data. Together they're the foundation of all real-world R data analysis. This chapter masters both structures and builds an employee records management system.

2. Lists

r
1234567891011121314151617181920212223242526272829303132
# Lists can hold ANY type — including other lists
employee <- list(
  name       = "Alice Johnson",
  age        = 32,
  salary     = 85000.50,
  skills     = c("R", "Python", "SQL", "Tableau"),
  is_active  = TRUE,
  address    = list(city="New York", state="NY", zip="10001")
)

# Accessing list elements
employee$name           # "Alice Johnson"
employee[["salary"]]    # 85000.5
employee[[4]]           # c("R", "Python", "SQL", "Tableau")
employee$skills[1]      # "R" (first skill)
employee$address$city   # "New York" (nested list)

# Modify list
employee$salary <- 92000        # Update existing
employee$department <- "Data"   # Add new field
employee$skills <- c(employee$skills, "Shiny")  # Append

# List operations
length(employee)     # Number of top-level elements
names(employee)      # Names of all elements
str(employee)        # Compact structure display

# lapply/sapply on list
dept_sizes <- list(IT=45, HR=12, Finance=28, Marketing=19, Sales=65)
total <- sum(unlist(dept_sizes))
percentages <- sapply(dept_sizes, function(x) round(x/total * 100, 1))
print(percentages)

3. Data Frames

r
1234567891011121314151617181920212223242526272829
# Data frame: table with named columns of potentially different types
employees <- data.frame(
  id         = 1:6,
  name       = c("Alice", "Bob", "Carol", "David", "Eve", "Frank"),
  dept       = c("IT", "HR", "IT", "Finance", "IT", "HR"),
  salary     = c(85000, 55000, 78000, 92000, 95000, 52000),
  years_exp  = c(5, 2, 3, 8, 10, 1),
  is_senior  = c(TRUE, FALSE, FALSE, TRUE, TRUE, FALSE),
  stringsAsFactors = FALSE
)

# Data frame properties
dim(employees)      # 6 7
nrow(employees)     # 6
ncol(employees)     # 7
names(employees)    # column names
str(employees)      # structure overview
summary(employees)  # statistical summary of all columns

# Accessing columns
employees$name          # Column as vector
employees[, "salary"]   # Same
employees[, 4]          # By position

# Accessing rows
employees[1, ]          # First row
employees[2:4, ]        # Rows 2-4
employees[employees$dept == "IT", ]   # Filter: IT dept only
employees[employees$salary > 80000, c("name", "salary")]  # Filter + select

4. Data Frame Manipulation

r
1234567891011121314151617181920212223242526
# Add new column
employees$bonus <- employees$salary * 0.10
employees$total_comp <- employees$salary + employees$bonus

# Add new row
new_emp <- data.frame(
  id=7, name="Grace", dept="IT", salary=88000,
  years_exp=4, is_senior=FALSE, bonus=8800, total_comp=96800
)
employees <- rbind(employees, new_emp)

# Modify values
employees$salary[employees$name == "Bob"] <- 58000  # Update Bob's salary

# Remove column
employees$is_senior <- NULL

# Rename column
names(employees)[names(employees) == "years_exp"] <- "experience"

# Sort data frame
employees <- employees[order(employees$salary, decreasing=TRUE), ]

# Subset (base R)
it_dept <- subset(employees, dept == "IT" & salary > 75000,
                   select = c(name, salary, experience))

5. Mini Project: Employee Records System

r
12345678910111213141516171819202122232425262728293031323334353637383940414243
# ─── EMPLOYEE RECORDS SYSTEM ─────────────────────────
set.seed(123)
depts <- c("Engineering", "Marketing", "Sales", "HR", "Finance")
n <- 30

emp_db <- data.frame(
  id       = 1:n,
  name     = paste0("Employee_", sprintf("%02d", 1:n)),
  dept     = sample(depts, n, replace=TRUE),
  salary   = round(runif(n, 40000, 120000), -3),
  rating   = round(runif(n, 1, 5), 1),
  tenure   = sample(1:15, n, replace=TRUE)
)

emp_db$grade <- ifelse(emp_db$rating >= 4.5, "S",
                 ifelse(emp_db$rating >= 3.5, "A",
                 ifelse(emp_db$rating >= 2.5, "B", "C")))
emp_db$bonus <- round(emp_db$salary * ifelse(emp_db$grade == "S", 0.20,
                                       ifelse(emp_db$grade == "A", 0.15,
                                       ifelse(emp_db$grade == "B", 0.10, 0.05))), -2)

cat("=== EMPLOYEE RECORDS SYSTEM ===\n\n")
cat("Total Employees:", nrow(emp_db), "\n")
cat("Total Payroll:  $", format(sum(emp_db$salary), big.mark=","), "\n")
cat("Avg Salary:     $", format(round(mean(emp_db$salary)), big.mark=","), "\n\n")

# Department summary
dept_summary <- aggregate(
  cbind(salary, bonus, rating) ~ dept,
  data = emp_db,
  FUN  = function(x) round(mean(x), 2)
)
cat("Department Summary:\n")
print(dept_summary)

# Top performers
top5 <- head(emp_db[order(-emp_db$rating), c("name","dept","salary","rating","grade")], 5)
cat("\nTop 5 Performers:\n")
print(top5)

# Grade distribution
cat("\nGrade Distribution:\n")
print(table(emp_db$grade))

6. Common Mistakes

  • employee[4] vs employee[[4]]: [4] returns a list with one element. [[4]] returns the element itself. Use [[]] to extract content.
  • stringsAsFactors default: In older R (<4.0), character columns were automatically converted to factors. In R 4.0+, default is FALSE. Be explicit when needed.

7. MCQs

Question 1

df$column vs df[["column"]]?

Question 2

list[[2]] vs list[2]?

Question 3

str(df) shows?

Question 4

rbind() combines data frames?

Question 5

aggregate(salary ~ dept, data, mean) computes?

Question 6

df$col <- NULL does?

Question 7

subset(df, cond, select=cols) is equivalent to?

Question 8

names(df)[3] <- "new_name" renames?

Question 9

summary(df) on numeric columns shows?

Question 10

Ordered data frame with order(-salary)?

8. Interview Questions

  • Q: What is the difference between a list and a data frame in R?
  • Q: How do you filter and select specific columns from a data frame?

9. Summary

Lists: ordered collections of any type, accessed with $name or [[i]]. Data frames: rectangular tables with named columns — R's primary data analysis structure. Key operations: subset(), aggregate(), rbind(), cbind(), order(). str() for structure overview, summary() for statistics. df$col <- NULL removes columns. In R 4.0+, strings are not auto-converted to factors.

10. Next Chapter Recommendation

In Chapter 11: Working with Strings in R, we master string manipulation using both base R and the stringr package from the tidyverse.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·