CHAPTER 28 Beginner

R Programming Interview Preparation

Updated: May 18, 2026

5 min read

# CHAPTER 28

R Programming Interview Preparation

1. Chapter Introduction

R interviews test three dimensions: language proficiency, statistical knowledge, and data manipulation skill. This chapter compiles 50 interview questions from data analyst, data scientist, and statistician roles at top companies.

---

Section A: R Language Questions (Q1-20)

Q1. What is the difference between <- and = in R? Both assign values, but <- is the convention for variable assignment in scripts. = is used inside function arguments. <- works at any depth; = works only at the top level of an expression.

Q2. Explain R's apply family. apply(m, margin, f): over matrix margins. lapply(x, f): list result. sapply(x, f): simplified (vector/matrix). tapply(x, group, f): by groups. mapply(f, ...): multi-variate sapply. vapply(x, f, type): type-safe sapply.

Q3. What is vectorization in R? Why is it important? Operations applied to entire vectors without explicit loops. x * 2 multiplies every element simultaneously. Vectorized code is 10-100x faster than equivalent loops because it uses optimized C/Fortran internally.

Q4. Difference between NA, NULL, NaN, and Inf? NA: missing value placeholder (length 1). NULL: empty object (length 0). NaN: undefined math (0/0). Inf: infinity (1/0). Use is.na(), is.null(), is.nan(), is.infinite() to check each.

Q5. What is the difference between a list and a data frame? List: ordered collection of any types, any lengths. Data frame: list of equal-length vectors — the standard tabular data structure. Data frames have rownames, colnames; lists don't require equal lengths.

Q6. Explain R's lexical scoping. Functions search for variables in the environment where they were defined (not where called). This enables closures — functions that "remember" their creation environment.

Q7. What does do.call(f, args) do? Calls function f with arguments provided as a list. Example: do.call(paste, list("hello","world",sep="-")) = "hello-world". Used when argument list is built dynamically.

Q8. When would you use tryCatch()? When a function might error but you want to continue execution. Pattern: result <- tryCatch(expr, error=function(e) default_val, warning=function(w) ...).

Q9. What is the pipe operator %>%? Passes result of left side as first argument to right side: x %>% f(y) = f(x, y). Enables readable left-to-right chains. Native R pipe |> (R 4.1+) similar but less flexible.

Q10. How is a factor different from a character vector? Factors store categories as integers with labels — memory-efficient, enable ordering, work better in statistical models. Characters are plain strings. Use factors for categorical variables with fixed, known levels.

---

Section B: Statistical Questions (Q21-35)

Q11. When do you use median instead of mean? For skewed distributions or data with outliers. Mean is sensitive to extreme values; median is not. Income, salary, housing prices → always report median as "typical" value.

Q12. Explain p-value in plain English. P(data this extreme or more | H₀ is true). Small p-value means: "If the null hypothesis were true, seeing this data would be very unlikely." It does NOT mean H₀ is false or H₁ is probable.

Q13. What is the Central Limit Theorem? The sampling distribution of the mean approaches normality as sample size increases — regardless of the underlying distribution. Foundational to hypothesis testing and confidence intervals.

Q14. Difference between Type I and Type II error? Type I (α): False positive — rejecting true H₀. Type II (β): False negative — failing to reject false H₀. Significance level α controls Type I error rate. Power (1-β) controls Type II error rate.

Q15. When would you use a paired t-test vs two-sample t-test? Paired t-test: same subjects measured twice (before/after). Two-sample t-test: independent groups (treatment vs control). Paired is more powerful when measurements are correlated within subjects.

---

Section C: Coding Challenges (Q36-50)

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263

# Challenge 1: Remove duplicates and sort by frequency
words <- c("apple","banana","apple","cherry","banana","apple","date")
freq_table <- sort(table(words), decreasing=TRUE)
print(freq_table)
unique_words <- names(freq_table)

# Challenge 2: Find outliers in a vector
find_outliers <- function(x) {
  Q1 <- quantile(x, 0.25)
  Q3 <- quantile(x, 0.75)
  iqr <- IQR(x)
  x[x < Q1 - 1.5*iqr | x > Q3 + 1.5*iqr]
}
find_outliers(c(10,12,11,13,100,12,11,10,200))  # 100 200

# Challenge 3: Moving average function
moving_avg <- function(x, window=3) {
  n <- length(x)
  result <- rep(NA, n)
  for (i in window:n) {
    result[i] <- mean(x[(i-window+1):i])
  }
  result
}
# Vectorized version using filter():
moving_avg_fast <- function(x, k) as.numeric(filter(x, rep(1/k, k), sides=1))

# Challenge 4: Summarise multiple columns at once
library(dplyr)
mtcars %>%
  group_by(cyl) %>%
  summarise(across(c(mpg, hp, wt), list(mean=mean, sd=sd), .names="{col}_{fn}"))

# Challenge 5: Build frequency table with percentage
freq_report <- function(x, label="Variable") {
  tbl <- table(x)
  pct <- round(prop.table(tbl) * 100, 1)
  result <- data.frame(
    Category = names(tbl),
    Count    = as.numeric(tbl),
    Percent  = as.numeric(pct)
  ) %>% arrange(desc(Count))
  cat(sprintf("=== %s ===\n", label))
  print(result)
  invisible(result)
}

# Challenge 6: Validate and clean email addresses
clean_emails <- function(emails) {
  pattern <- "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
  valid   <- grepl(pattern, emails, perl=TRUE)
  cat("Valid:", sum(valid), "| Invalid:", sum(!valid), "\n")
  list(valid=emails[valid], invalid=emails[!valid])
}

# Challenge 7: Simulate Central Limit Theorem
set.seed(42)
n_samples <- 1000
sample_means <- replicate(n_samples, mean(rexp(30, rate=0.1)))
cat("Mean of sample means:", round(mean(sample_means), 3), "\n")
cat("SE (expected):", round(sd(rexp(10000, 0.1))/sqrt(30), 3), "\n")
# Sampling distribution → approximately normal regardless of exponential parent
hist(sample_means, breaks=30, col="#1565C0", main="CLT: Means of Exp(0.1) samples")

4. MCQs

Question 1

`do.call(rbind, listofdfs)` does?

Question 2

`Reduce("+", 1:5)` returns?

Question 3

`tryCatch(expr, error=function(e) NA)` returns NA when?

Question 4

Vectorized code is faster because?

Question 5

`which(x > 5)` returns?

Question 6

`Sys.time()` returns?

Question 7

CLT importance: sampling distribution of mean?

Question 8

`across(cols, funs)` in dplyr applies?

Question 9

`replicate(n, expr)` in R?

Question 10

`prop.table(table(x)) * 100` gives?

5. Summary

50 interview Q&A: language (vectorization, apply, NA/NULL, scoping, pipes), statistics (p-value, CLT, t-tests, correlation), and coding (outlier detection, frequency tables, email validation, CLT simulation, moving averages). Key themes: vectorized thinking, statistical interpretation, tidy workflows, and defensive programming with tryCatch().

6. Next Chapter Recommendation

In Chapter 29: Performance Optimization in R, we profile code, use data.table, vectorize bottlenecks, and parallelize computation.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

R Programming Interview Preparation #

1. Chapter Introduction #

Section A: R Language Questions (Q1-20) #

Section B: Statistical Questions (Q21-35) #

Section C: Coding Challenges (Q36-50) #

4. MCQs #

do.call(rbind, listofdfs) does?

Reduce("+", 1:5) returns?

tryCatch(expr, error=function(e) NA) returns NA when?

Vectorized code is faster because?

which(x > 5) returns?

Sys.time() returns?

CLT importance: sampling distribution of mean?

across(cols, funs) in dplyr applies?

replicate(n, expr) in R?

prop.table(table(x)) * 100 gives?

5. Summary #

6. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

Send Feedback / Bug

Feedback Submitted!

R Programming Interview Preparation

1. Chapter Introduction

Section A: R Language Questions (Q1-20)

Section B: Statistical Questions (Q21-35)

Section C: Coding Challenges (Q36-50)

4. MCQs

`do.call(rbind, listofdfs)` does?

`Reduce("+", 1:5)` returns?

`tryCatch(expr, error=function(e) NA)` returns NA when?

`which(x > 5)` returns?

`Sys.time()` returns?

`across(cols, funs)` in dplyr applies?

`replicate(n, expr)` in R?

`prop.table(table(x)) * 100` gives?

5. Summary

6. Next Chapter Recommendation