Skip to main content
R Programming
CHAPTER 11 Beginner

Working with Strings in R

Updated: May 18, 2026
5 min read

# CHAPTER 11

Working with Strings in R

1. Chapter Introduction

Text data is everywhere — customer names, product descriptions, survey responses. This chapter masters string manipulation using both base R functions and the tidyverse stringr package, which provides a consistent, intuitive API.

2. Base R String Operations

r
1234567891011121314151617181920212223242526272829303132
# String creation and basics
s <- "Hello, R World!"
nchar(s)          # 16 (length)
toupper(s)        # "HELLO, R WORLD!"
tolower(s)        # "hello, r world!"
trimws("  hello  ")      # "hello" (trim whitespace both sides)
trimws("  hello  ", "left")   # "hello  " (left only)
trimws("  hello  ", "right")  # "  hello" (right only)

# Concatenation
paste("Hello", "World")          # "Hello World" (space separator)
paste("Hello", "World", sep="-") # "Hello-World"
paste0("Hello", "World")         # "HelloWorld" (no separator)
paste(1:5, collapse=", ")        # "1, 2, 3, 4, 5"

# Substring operations
substr("RStudio", 1, 2)   # "RS" (start, stop positions)
substring("R Programming", 3, 13)  # "Programming"
substr(s, 8, 16) <- "Universe!"   # Modify substring in place

# Splitting
strsplit("Alice,Bob,Carol", ",")       # list: "Alice" "Bob" "Carol"
strsplit(c("a b", "c d e"), " ")      # list of length 2

# Search and replace
sub("World", "R", "Hello World")     # First match only
gsub("o", "0", "Hello World")        # All matches: "Hell0 W0rld"

# String formatting
sprintf("Name: %-10s Salary: $%8.2f", "Alice", 85000)
format(12345.678, nsmall=2, big.mark=",")  # "12,345.68"
formatC(0.0001234, format="e", digits=3)   # "1.234e-04"
r
12345678910111213141516171819202122232425262728293031323334
library(stringr)

s <- c("Alice Johnson", "  Bob Smith  ", "carol davis", "David_Lee-Jr")

# Detect and count
str_detect(s, "Alice")     # TRUE FALSE FALSE FALSE
str_count(s, "[aeiou]")    # vowel counts per string

# Case and whitespace
str_to_upper(s)            # ALL CAPS
str_to_lower(s)            # all lowercase
str_to_title(s)            # Title Case
str_trim(s)                # Remove leading/trailing whitespace
str_squish(s)              # Also collapses internal spaces

# Extract and replace
str_sub(s, 1, 5)           # First 5 chars
str_replace(s, "_", " ")   # First underscore → space
str_replace_all(s, "[_-]", " ")  # All underscores and hyphens

# Split and combine
str_split("2024-01-15", "-")   # list: "2024" "01" "15"
str_c("Hello", c("Alice", "Bob"), sep=", ")  # "Hello, Alice" "Hello, Bob"

# Padding and length
str_pad("42", width=6, pad="0")   # "000042" (like printf)
str_pad("R",  width=10, side="both")  # "    R     " (centered)
str_length(s)   # Character counts

# Pattern extraction
emails <- c("alice@company.com", "bob@gmail.com", "invalid-email")
str_detect(emails, "^[a-zA-Z0-9.]+@[a-zA-Z0-9.]+\\.[a-zA-Z]{2,}$")
# TRUE TRUE FALSE
str_extract(emails, "@(.+)$")  # Extract domain part

4. Regular Expressions (Regex)

r
12345678910111213141516171819202122232425
# Common regex patterns in R
texts <- c("Phone: 555-1234", "Email: alice@test.com",
            "ZIP: 10001", "Date: 2024-01-15")

# Extract phone numbers
str_extract(texts, "\\d{3}-\\d{4}")  # "555-1234" NA NA NA

# Extract emails
str_extract(texts, "[a-zA-Z0-9.]+@[a-zA-Z0-9.]+\\.[a-zA-Z]+")

# Extract 4-digit years
str_extract(texts, "\\d{4}")

# Key regex patterns:
# .     — any character
# \\d   — digit [0-9]
# \\w   — word char [a-zA-Z0-9_]
# \\s   — whitespace
# ^     — start of string
# $     — end of string
# +     — one or more
# *     — zero or more
# ?     — zero or one
# {n,m} — n to m occurrences
# [abc] — character class

5. Common Mistakes

  • paste() vs paste0(): paste() adds a space by default. paste0() has no separator. Always use sep="" explicitly or paste0() for concatenation without spaces.
  • Regex escaping: In R strings, \d needs to be \\d because \ is R's escape character. Always double-backslash regex patterns: \\d, \\w, \\s.

6. MCQs

Question 1

nchar("Hello") returns?

Question 2

paste0("R", "Studio") returns?

Question 3

gsub() vs sub()?

Question 4

strdetect(x, "\\d") returns?

Question 5

strtrim() removes?

Question 6

strsplit("a,b,c", ",") returns?

Question 7

\\d{4} matches?

Question 8

strpad("7", 3, pad="0") returns?

Question 9

toupper() converts?

Question 10

strextract() returns?

7. Interview Questions

  • Q: What is the difference between gsub() and sub() in R?
  • Q: How do you extract email addresses from a vector of strings?

8. Summary

String essentials: nchar(), toupper/lower(), paste()/paste0(), substr(), gsub(), strsplit(). stringr package: strdetect(), strextract(), strreplaceall(), strtrim(), strc(), str_pad(). Regex in R requires double-backslash escaping (\\d, \\w). Use stringr for consistent, readable string code in tidyverse workflows.

9. Next Chapter Recommendation

In Chapter 12: File Handling in R, we read and write text files, CSVs, and manage file system paths for data pipeline workflows.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·