Python Regular Expressions
# Python Regular Expressions
Welcome to Chapter 23! Regular expressions (regex) are powerful tools for pattern matching in strings — validating emails, extracting data, searching and replacing text.
---
1. Learning Objectives
- Understand regex patterns and metacharacters.
-
Use
remodule functions: match, search, findall, sub.
- Write patterns for validation and extraction.
- Apply regex in real-world scenarios.
---
2. Getting Started with re
```python id="py23_ex1" import re
# search — find first match anywhere in string result = re.search(r"Python", "I love Python programming") if result: print(f"Found: '{result.group()}' at position {result.start()}")
# match — match at the BEGINNING of string result = re.match(r"Hello", "Hello World") print(result.group()) # Hello
# findall — find ALL matches numbers = re.findall(r"\d+", "I have 3 cats and 5 dogs") print(numbers) # ['3', '5']
python id="py23ex2" import re
# Email validation emailpattern = r'^[\w\.-]+@[\w\.-]+\.\w{2,}$' print(re.match(emailpattern, "user@example.com")) # Match! print(re.match(emailpattern, "invalid-email")) # None
# Phone number phonepattern = r'\d{3}[-.\s]?\d{3}[-.\s]?\d{4}' phones = re.findall(phonepattern, "Call 123-456-7890 or 987.654.3210") print(phones) # ['123-456-7890', '987.654.3210']
# URL extraction urlpattern = r'https?://[\w\.-]+(?:/[\w\.-]*)*' text = "Visit https://python.org and http://example.com/page" urls = re.findall(urlpattern, text) print(urls)
python id="py23_ex3" # sub — search and replace text = "My phone is 123-456-7890" censored = re.sub(r'\d', '*', text) print(censored) # My phone is *-*-****
# split — split by pattern data = "apple;banana,cherry orange" items = re.split(r'[;,\s]+', data) print(items) # ['apple', 'banana', 'cherry', 'orange']
# Groups — extract parts of a match pattern = r'(\w+)@(\w+)\.(\w+)' match = re.search(pattern, "Contact: alice@gmail.com") if match: print(f"User: {match.group(1)}") # alice print(f"Domain: {match.group(2)}") # gmail print(f"TLD: {match.group(3)}") # com
python id="py23_ex4" # Case-insensitive result = re.findall(r'python', 'Python PYTHON python', re.IGNORECASE) print(result) # ['Python', 'PYTHON', 'python']
# Multiline text = "First line\nSecond line\nThird line" starts = re.findall(r'^\w+', text, re.MULTILINE) print(starts) # ['First', 'Second', 'Third']
python id="py23ex5" def validateemail(email): pattern = r'^[\w\.-]+@[\w\.-]+\.\w{2,}$' return bool(re.match(pattern, email))
def validatephone(phone): pattern = r'^\+?\d{1,3}[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$' return bool(re.match(pattern, phone))
def validatepassword(pwd): checks = { "8+ chars": len(pwd) >= 8, "Uppercase": bool(re.search(r'[A-Z]', pwd)), "Lowercase": bool(re.search(r'[a-z]', pwd)), "Digit": bool(re.search(r'\d', pwd)), "Special": bool(re.search(r'[!@#$%^&*]', pwd)) } return all(checks.values()), checks
print(validateemail("test@example.com")) # True print(validatephone("123-456-7890")) # True
valid, details = validate_password("Hello123!")
print(f"Valid: {valid}, Details: {details}")
``
---
8. MCQs with Answers
Q1: \d matches:
A) Letters B) Digits C) Whitespace D) Any char
Answer: B
Q2: re.findall() returns:
A) First match B) Boolean C) List of all matches D) Match object
Answer: C
Q3: ^ matches:
A) End of string B) Start of string C) Any position D) New line
Answer: B
Q4: re.sub() does:
A) Find B) Search and replace C) Split D) Match
Answer: B
Q5: + quantifier means:
A) 0 or more B) 1 or more C) 0 or 1 D) Exactly 1
Answer: B
Q6: [a-zA-Z] matches:
A) Only lowercase B) Only uppercase C) Any letter D) Alphanumeric
Answer: C
Q7: Raw string prefix for regex: A) f B) b C) r D) x Answer: C
Q8: re.IGNORECASE does:
A) Ignores errors B) Case-insensitive matching C) Ignores whitespace D) Ignores newlines
Answer: B
Q9: () in regex creates:
A) Comment B) Capture group C) Class D) Range
Answer: B
Q10: . matches:
A) Only dots B) Any char except newline C) Digits D) Letters
Answer: B
---
9. Interview Questions
- 1. What is regex? A sequence of characters defining a search pattern for string matching.
-
2.
match()
vssearch()?match()checks at the beginning;search()finds first occurrence anywhere.
-
3.
Greedy vs lazy matching? *
and+are greedy (match max). Add?for lazy (match min):.*?.
-
4.
How to compile regex? pattern = re.compile(r'\d+')
— improves performance when pattern is reused.
-
5.
Named groups? (?P<name>pattern)
— access withmatch.group('name').
---
10. Summary
-
re
module provides regex support in Python.
-
Key functions: search()
,match(),findall(),sub(),split().
-
Common patterns: \d
(digit),\w(word),\s(space),.(any),^/$(anchors).
-
Use raw strings (r"..."
) for regex patterns.
-
Use flags like re.IGNORECASE
andre.MULTILINE`.
---
11. Next Chapter Recommendation
In Chapter 24: Working with JSON and APIs, you'll learn to parse JSON and make API requests! 🚀