CHAPTER 05
Introduction to Reconnaissance and Information Gathering
Updated: May 15, 2026
20 min read
# CHAPTER 5
Introduction to Reconnaissance and Information Gathering
1. Introduction
In Hollywood movies, hackers smash their keyboards for 30 seconds and instantly bypass a firewall. In reality, a professional penetration test is 80% research and 20% exploitation. The research phase is known as Reconnaissance (or "Recon"). Before you interact with a target system, you must learn everything about it: its IP addresses, the technologies it runs, and the humans who work there. In this chapter, we will explore Passive Reconnaissance, utilizing Open-Source Intelligence (OSINT) to legally and silently gather data without ever alerting the target.2. Learning Objectives
By the end of this chapter, you will be able to:- Define the Reconnaissance phase of penetration testing.
- Differentiate between Passive and Active Reconnaissance.
- Understand Open-Source Intelligence (OSINT).
-
Gather domain and IP information using
whoisand DNS records.
- Analyze public metadata for unintentional information disclosure.
3. Beginner-Friendly Explanation
Imagine a detective preparing to investigate a corporate building.- Active Recon (Knocking on doors): The detective walks up to the front door, shakes the handle to see if it's locked, and asks the receptionist questions. The company *knows* the detective is there. They might call security. (In cyber, this is port scanning).
- Passive Recon (OSINT): The detective sits across the street with binoculars. They look at the company's public website to find the CEO's name. They search LinkedIn to find out what software the IT team uses. They look at public property records. The company has *no idea* they are being investigated.
Passive Recon uses only publicly available information. It is completely legal and stealthy.
4. Open-Source Intelligence (OSINT)
OSINT is data collected from publicly available sources to be used in an intelligence context. Sources include:- Search engines (Google Dorking)
- Social media (LinkedIn, Twitter)
- Public government records
- Job postings (A goldmine for finding out what tech stack a company uses)
- GitHub repositories (Often accidentally containing hardcoded passwords)
5. Domain Information (whois)
When a company registers a domain name (like example.com), they must provide contact information to the registrar. Historically, this included the physical address, phone number, and email of the IT administrator!
While privacy laws (like GDPR) now hide a lot of this data, running a whois query is still the first step of recon. It tells you when the domain was registered, when it expires, and which DNS servers it uses.
6. Google Dorking
Search engines index almost everything, including things companies didn't mean to make public. "Google Dorking" (or Google Hacking) uses advanced search operators to find sensitive files. Examples:-
site:example.com(Only shows results from that specific site).
-
filetype:pdf(Only shows PDF files).
-
intitle:"index of"(Can reveal open, unprotected server directories).
7. Mini Project: Perform Safe OSINT Research
Let's use the terminal to gather data on a domain without ever touching their actual web server.Step-by-Step Walkthrough: *(Run these in your Kali Terminal)*
- 1. Whois Lookup: Let's look up the registration data for a demo site.
bash
*Scroll up and look for "Registrant Organization" or "Name Servers".*
- 2. DNS Recon: Let's find out what IP address the domain points to, and what mail servers they use (MX records).
bash
-
3.
The Wayback Machine: Open your browser and go to
archive.org. Search for a website. You can view what the website looked like 10 years ago. Attackers use this to find old, forgotten pages that might still exist on the server but are no longer linked on the main menu!
8. Real-World Scenarios
A penetration tester is hired to test a financial firm. During the Passive Recon phase, the tester goes to LinkedIn and searches for the firm's IT employees. They find a junior developer's profile. Under "Experience," the developer boasts: *"Managed a fleet of 50 Apache Tomcat 7 servers running on Ubuntu 14.04."* The pentester hasn't even scanned the network yet, but they already know exactly what software and operating systems the company uses, and more importantly, they know the software is severely outdated and likely vulnerable.9. Best Practices
- Metadata Scrubbing: When a company uploads a PDF document or a photograph to their public website, the file often contains hidden "Metadata" (EXIF data). A photo taken on a smartphone contains the exact GPS coordinates of where it was taken, the phone model, and the time. A PDF often contains the username of the employee who created it. Defensive Best Practice: Companies must scrub all metadata from files before publishing them to the public internet.
10. Legal and Ethical Notes
Passive reconnaissance is generally legal because you are only querying public databases (like Google or DNS registries), not the target's actual infrastructure. However, once you transition from Passive Recon (Googling) to Active Recon (sending packets directly to the target's IP address to see what ports are open), you MUST have explicit written authorization.11. Exercises
- 1. Define OSINT. Provide three examples of public sources an attacker might use to gather information about a target company.
- 2. How can a company's public job postings on Indeed or LinkedIn unintentionally weaken their cybersecurity posture?
12. FAQs
Q: Is it illegal to use Google Dorks? A: No, typing advanced search queries into Google is not illegal. However, if a Google Dork reveals a database file containing customer credit cards that was accidentally left public, downloading that file or using that data is a severe crime.13. Interview Questions
- Q: Differentiate between Passive and Active Reconnaissance in the context of a penetration testing engagement. Why is Passive Recon conducted first?
- Q: Explain the security risk associated with document metadata (EXIF data). Detail a scenario where unscrubbed metadata could compromise an organization's physical or digital security.
14. Summary
In Chapter 5, we learned the art of digital observation. We defined the Reconnaissance phase and distinguished between the stealthy nature of Passive Recon and the noisy nature of Active Recon. We utilized OSINT techniques, DNS queries, andwhois lookups to map an organization's digital footprint without ever interacting directly with their servers. We recognized that the human element (social media, job postings) is often the most revealing source of intelligence.