CHAPTER 17
Security Monitoring and Incident Response
Updated: May 15, 2026
25 min read
# CHAPTER 17
Security Monitoring and Incident Response
1. Introduction
Prevention eventually fails. No matter how many firewalls you build or how many secure coding practices you enforce, a dedicated attacker will eventually find a way in—often because an employee clicked a well-crafted phishing link. What matters next is how quickly you detect the intruder and how efficiently you eradicate them. This is the domain of the Blue Team. In this chapter, we will explore the critical role of Logs, the function of a SIEM, and the structured lifecycle of Incident Response.2. Learning Objectives
By the end of this chapter, you will be able to:- Define the role of a Security Operations Center (SOC).
- Understand the importance of centralized Logging.
- Define a SIEM (Security Information and Event Management) system.
- Understand the 6 phases of the Incident Response Lifecycle (NIST).
- Differentiate between a True Positive and a False Positive alert.
3. Beginner-Friendly Explanation
Imagine a museum security system.- The Logs (The Cameras & Sensors): Every door has a sensor. Every hallway has a camera. If a door opens, a sensor records: "Door 3 opened at 2:00 AM."
- The SIEM (The Security Desk): It is impossible for one guard to watch 500 camera screens simultaneously. A computer system (The SIEM) monitors all sensors. If it sees "Door 3 opened" AND "Motion detected in the Diamond Room", it flashes a giant red alert on the guard's screen.
- Incident Response (The Guards): The guard sees the red alert. They follow a strict playbook: lock the exits, call the police, and apprehend the intruder.
4. Logs and the SIEM
A Log is a text record of an event (e.g., *User Alice logged in from IP 192.168.1.5 at 14:00*). Firewalls, web servers, and Windows machines generate thousands of logs per second. A SIEM (e.g., Splunk, IBM QRadar, Elastic Security) is a massive database that ingests all these logs centrally. A SOC Analyst writes "Rules" in the SIEM: *Rule:* "If a user fails to log in 50 times in 1 minute (Brute Force), and then successfully logs in on the 51st try, trigger a Critical Alert."5. SOC Alerts: Positives and Negatives
When a SIEM fires an alert, the SOC Analyst must investigate it:- True Positive: The alert fired, and it is a real hacker. Action required!
- False Positive: The alert fired, but it was just a forgetful employee typing their password wrong 50 times. (Analysts spend most of their day tuning out false positives).
- False Negative: The absolute worst-case scenario. A hacker broke in, but the SIEM failed to detect it. The attack goes unnoticed.
6. The Incident Response (IR) Lifecycle
When a True Positive is confirmed, the Incident Response team takes over. The NIST framework defines 6 phases:- 1. Preparation: Having backups, playbooks, and a trained team ready *before* a breach happens.
- 2. Identification: Confirming a breach has actually occurred based on alerts or user reports.
- 3. Containment: Stopping the bleeding. (e.g., Unplugging the infected server from the network so the ransomware can't spread).
- 4. Eradication: Removing the malware, deleting the hacker's accounts, and patching the vulnerability they used to get in.
- 5. Recovery: Restoring the server from clean backups and bringing it back online.
- 6. Lessons Learned: Writing a report detailing what failed and how to prevent it from happening again.
7. Mini Project: Build an Incident Response Workflow
Let's build a conceptual IR playbook for a specific scenario: "Malware Detected on Employee Laptop."The Playbook:
- 1. Identification: The Antivirus software alerts the SIEM that "Ransomware.CryptXXX" was detected on Alice's laptop.
- 2. Containment:
- *Immediate Action:* Use the central network switch to disable the Wi-Fi and Ethernet ports for Alice's specific laptop. Do NOT turn the laptop off (you might lose forensic evidence in the RAM).
- 3. Eradication:
- Image the hard drive for forensic analysis.
- Wipe the laptop's hard drive completely. Do not try to "clean" the virus; wipe it.
- 4. Recovery:
- Reinstall a clean Windows OS.
- Restore Alice's files from yesterday's cloud backup.
- 5. Lessons Learned: Analyze how the malware arrived. (Was it a phishing email? We need to block that sender domain and retrain Alice).
8. Real-World Scenarios
A hospital's network is hit by ransomware. They skip the Preparation phase and do not have a playbook or offline backups. When the SIEM alerts them, panic ensues. The IT team fails the Containment phase by leaving the infected machines connected to the network, allowing the ransomware to spread to the MRI machines. They are forced to pay a $5 million ransom because they have no backups to execute the Recovery phase. Proper Incident Response could have contained the damage to a single laptop.9. Best Practices
- Immutable Backups: Ransomware gangs now actively seek out your backup servers and delete them before encrypting your main servers. You must have "Immutable" backups (e.g., AWS S3 Object Lock) or offline tape backups that absolutely cannot be deleted or modified by anyone, not even the System Administrator.
10. Legal and Ethical Notes
During Incident Response, preserving evidence is critical. If an employee is suspected of malicious insider activity, IT cannot just casually rummage through their laptop. A strict "Chain of Custody" must be maintained by digital forensics experts to ensure the evidence is admissible in a court of law.11. Exercises
- 1. Define the role of a SIEM in a modern Security Operations Center (SOC). Why is manually reading log files on individual servers no longer a viable defensive strategy?
- 2. Explain the critical difference between the Containment phase and the Eradication phase of Incident Response.
12. FAQs
Q: How fast do hackers move once they get inside? A: "Breakout time" (the time it takes an attacker to compromise the first machine and move laterally to a second machine) averages around 1-2 hours for advanced groups. The SOC must detect, investigate, and contain the threat faster than the breakout time.13. Interview Questions
- Q: Describe the six phases of the NIST Incident Response Lifecycle. In a scenario where an active ransomware infection is detected on a workstation, detail your specific actions during the Containment and Eradication phases.
- Q: A SIEM rule triggers an alert for "Multiple Failed Login Attempts Followed by Success." How would you investigate this alert to determine if it is a True Positive (a successful brute-force attack) or a False Positive (user error)?