Skip to main content
API Security Tutorial
CHAPTER 09 Intermediate

Input Validation and Data Sanitization

Updated: May 13, 2026
20 min read

# CHAPTER 9

Input Validation and Data Sanitization

1. Introduction

The Golden Rule of Backend Development is: Never trust user input. Every piece of data that enters your API—whether through the URL, Headers, or JSON Body—must be treated as highly suspicious. If you process data without checking it first, you open the door to fatal database injections, application crashes, and compromised servers. In this chapter, we will learn the crucial concepts of Input Validation (checking if data is correct) and Data Sanitization (cleaning data to make it safe), focusing on PHP implementation.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Understand the fundamental difference between Validation and Sanitization.
  • Implement strict type checking and validation rules for incoming API data.
  • Utilize PHP's built-in filter_var() functions.
  • Recognize the dangers of bypassing backend validation.
  • Implement a robust validation layer before database execution.

3. Beginner-Friendly Explanation

Imagine a nightclub bouncer checking IDs.
  • Validation: The bouncer looks at the ID and asks, "Is this a valid driver's license? Are they over 21? Is the ID expired?" If the answer is no, the person is immediately rejected and sent away. *The data must fit strict rules.*
  • Sanitization: Someone tries to bring a water bottle into the club. The bouncer says, "You can come in, but I'm taking the cap off the bottle and pouring out any alcohol first." The bouncer actively *changes* the item to make it safe before allowing it inside. *The data is stripped of dangerous parts.*

In an API, if you expect an "Age", and the user sends "Age": "Twenty-Five", Validation rejects it because it's not an integer. If the user sends "Name": "John<script>", Sanitization removes the <script> tag before saving it to the database.

4. Real-World Attack Scenarios

  • Type Manipulation: A developer expects the price parameter to be an integer. An attacker sends a massive string or an array instead of an integer. The backend script crashes trying to do math on a string, causing a Denial of Service (DoS), or revealing sensitive database structure in the PHP error logs.
  • Buffer Overflow/Excessive Payload: A registration API expects a username. It fails to validate the length. An attacker sends a username that is 50 Megabytes long. The server runs out of RAM trying to process it and crashes.

5. Validation vs Sanitization Rule of Thumb

  • Validate First: Always prefer strict validation. If the data isn't exactly what you expect (e.g., you asked for an email, they gave you a physical address), reject the request with a 400 Bad Request. Do not try to guess what they meant.
  • Sanitize Second: Only sanitize when necessary (e.g., stripping HTML tags from a blog comment). If you over-sanitize, you might accidentally corrupt valid data (like an Irish last name with an apostrophe: O'Connor).

6. PHP Examples (Validation)

PHP provides excellent built-in filters. Use them!
php
1234567891011121314151617181920
<?php
$email = $_POST[&#039;email'];
$age = $_POST[&#039;age'];

// 1. VALIDATION: Check if it's a real email format
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
    http_response_code(400);
    echo json_encode(["error" => "Invalid email format."]);
    exit;
}

// 2. VALIDATION: Check if age is an integer between 18 and 120
if (!filter_var($age, FILTER_VALIDATE_INT, array("options" => array("min_range"=>18, "max_range"=>120)))) {
    http_response_code(400);
    echo json_encode(["error" => "Age must be a number between 18 and 120."]);
    exit;
}

echo "Data is valid! Proceed to database.";
?>

7. PHP Examples (Sanitization)

If you must clean data before processing it:
php
12345678
<?php
$dirty_string = "<h1>Hello</h1> <script>alert(&#039;hack')</script>";

// SANITIZATION: Strip tags and encode special characters
$clean_string = filter_var($dirty_string, FILTER_SANITIZE_STRING);

// Result: Hello alert('hack') (The malicious script is neutralized)
?>

*Note: As of PHP 8.1, FILTERSANITIZESTRING is deprecated. Best practice is to use htmlspecialchars() for sanitizing output, which we will cover in Chapter 11.*

8. Validating JSON Payloads

When building a modern API, users send JSON. You must validate the structure of the JSON before processing it.
php
12345678910111213141516171819202122232425
<?php
$raw_data = file_get_contents("php://input");
$data = json_decode($raw_data, true);

// 1. Check if JSON decoding failed (malformed JSON)
if (json_last_error() !== JSON_ERROR_NONE) {
    http_response_code(400);
    echo json_encode(["error" => "Invalid JSON payload."]);
    exit;
}

// 2. Check if required fields exist
if (!isset($data[&#039;username']) || !isset($data['password'])) {
    http_response_code(400);
    echo json_encode(["error" => "Missing username or password."]);
    exit;
}

// 3. Check data types and length
if (!is_string($data[&#039;username']) || strlen($data['username']) > 50) {
    http_response_code(400);
    echo json_encode(["error" => "Username must be a string under 50 characters."]);
    exit;
}
?>

9. Best Practices

  • Allow-listing over Block-listing: Don't try to guess every bad thing an attacker might send (Block-listing). Instead, define exactly what a "good" input looks like (Allow-listing) and reject everything else. For example, if a field is "State", don't check for <script>. Check if the input perfectly matches one of the 50 US state codes.
  • Fail Fast: Validate data at the very beginning of your script. If data is invalid, return a 400 Bad Request immediately to save server processing power.
  • Provide Useful Errors: Tell the client exactly what failed: {"error": "The 'age' parameter must be an integer."}

10. Common Mistakes

  • Relying on HTML5 Frontend Validation: Using <input type="email" required> on the frontend is great for UX, but it provides ZERO security. An attacker bypasses the browser and sends raw data via Postman. You MUST validate on the backend.
  • Trusting HTTP Headers: Developers often validate the Body but trust Headers (like User-Agent or X-Forwarded-For). Attackers can inject malicious scripts or SQL into Headers just as easily.

11. Mini Exercises

  1. 1. What is the difference between Validation and Sanitization?
  1. 2. If a user submits their age as the string "twenty", should the API attempt to sanitize it, or validate and reject it?

12. Practice Challenges

Challenge: Write a PHP snippet that receives a JSON payload containing an array of tags (e.g., ["php", "security", "api"]). Write validation logic that ensures:
  1. 1. tags exists.
  1. 2. tags is an array.
  1. 3. The array contains no more than 5 tags.
  1. 4. Each tag is a string.

13. MCQs with Answers

Question 1

Which process involves checking if incoming data meets strict criteria (e.g., is it an integer? is it an email?) and rejecting it if it fails?

Question 2

Why is frontend (JavaScript/HTML) validation insufficient for API security?

Question 3

What PHP function is explicitly designed to both validate and sanitize variables based on predefined filters?

14. Interview Questions

  • Q: Explain the phrase "Never trust user input" in the context of API development.
  • Q: Explain the difference between Allow-listing (White-listing) and Block-listing (Black-listing) for data validation. Which is more secure?
  • Q: If you are building a PHP API, what steps do you take to validate an incoming raw JSON body payload?

15. FAQs

Q: Should I use Regular Expressions (Regex) for validation? A: Yes, Regex is extremely powerful for Allow-listing complex patterns (like validating a custom Product ID format: PRD-12345). However, complex Regex can be vulnerable to ReDoS (Regular Expression Denial of Service) attacks if an attacker sends massive strings that trap the Regex in infinite loops. Use built-in filters when possible, and simple Regex when necessary.

16. Summary

In this chapter, we cemented the foundational rule of backend security: never trust the client. We defined the critical differences between Validation (rejecting bad data) and Sanitization (cleaning bad data). We demonstrated how to validate JSON structure, enforce data types, and utilize PHP's filter
var() to ensure that malicious, malformed, or oversized data is blocked at the gate before it ever reaches the application logic.

17. Next Chapter Recommendation

What happens if unvalidated data actually makes it into your database queries? Disaster. Proceed to Chapter 10: Protecting Against SQL Injection to learn how to defend your MySQL databases from the oldest and deadliest attack on the internet.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·