Skip to main content
Computer Vision Tutorial
CHAPTER 09 Beginner

Optical Character Recognition (OCR)

Updated: May 14, 2026
20 min read

# CHAPTER 9

Optical Character Recognition (OCR)

1. Introduction

Imagine taking a photo of a restaurant menu and wanting to copy-paste the text to send to a friend. To your phone, that photo is just a grid of colored pixels. It doesn't know there are letters in it. Optical Character Recognition (OCR) is the magical bridge between Computer Vision and Natural Language Processing. It is the technology that finds text in an image and converts it into actual, editable digital strings.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define Optical Character Recognition (OCR).
  • Explain the two-step workflow of modern OCR systems.
  • Understand the challenges of reading handwriting and messy backgrounds.
  • Identify the most famous open-source OCR engine (Tesseract).

3. Beginner-Friendly Explanation

If you draw the letter "A" on a piece of paper, you know it's an "A". If you scan that paper into a computer, the computer just sees black pixels on a white background. You cannot highlight it, copy it, or search for it. OCR is an AI that acts like an incredibly fast typist. It scans the image, looks at the shapes of the black pixels, realizes "that shape looks exactly like the letter A," and types the letter "A" into a digital text document. It converts *images of text* into *actual text data*.

4. The Two-Step OCR Workflow

Modern OCR isn't just one algorithm; it is a two-step pipeline:
  1. 1. Text Detection (Where is it?): The AI scans the image (like a street sign or a receipt) and draws Bounding Boxes around areas that look like lines of text.
  1. 2. Text Recognition (What does it say?): The AI crops out those specific bounding boxes and passes them to a classification neural network that predicts the exact letters and numbers inside the box.

5. Why is OCR Difficult?

Reading a clean, scanned PDF is easy. Reading text in the "wild" is incredibly difficult.
  • Fonts and Handwriting: An AI trained only on the "Arial" font will completely fail to read a doctor's messy cursive handwriting.
  • Perspective: If you take a photo of a billboard from an angle, the text is skewed and warped.
  • Background Clutter: Trying to read a white license plate when there is snow and glare on the bumper confuses the edge detectors.

6. Real-World Applications

  • Automated Toll Booths / Speed Cameras: As you drive by, a camera takes a picture of your car. OCR reads the license plate pixels, converts them to a string ("ABC-123"), and queries the DMV database to send you a ticket.
  • Banking: When you use your bank app's "Mobile Deposit" feature, OCR reads the handwritten dollar amount and the printed routing numbers on the physical check.
  • Translation: The Google Translate app lets you point your camera at a foreign street sign. OCR extracts the foreign text, passes it to NLP for translation, and overlays the English text back onto your screen.

7. Tesseract: The King of Open Source OCR

Developed by HP in the 1980s and now maintained by Google, Tesseract is the most widely used open-source OCR engine in the world. It uses advanced Long Short-Term Memory (LSTM) neural networks to recognize character patterns across dozens of languages.

8. Python Example: Using Tesseract

Using the pytesseract Python wrapper, we can extract text from an image in just three lines of code.
python
1234567891011121314151617
import cv2
import pytesseract

# Note: You must install the Tesseract software on your OS before the Python wrapper will work!
# pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# 1. Load the image containing text
img = cv2.imread("scanned_receipt.jpg")

# 2. Convert to Grayscale (OCR engines prefer grayscale for high contrast)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# 3. Run the OCR engine to extract the text as a string!
extracted_text = pytesseract.image_to_string(gray)

print("--- TEXT FOUND IN IMAGE ---")
print(extracted_text)

9. Mini Project

Audit the Preprocessing: You are trying to run OCR on a photograph of an old, faded, yellowed newspaper. The Tesseract engine is returning gibberish. What Image Processing step (from Chapter 3) should you apply to the image *before* passing it to Tesseract to make the faded ink pop out against the yellow paper? *(Answer: You should use Contrast Enhancement / Thresholding to force the faded ink to become pure black and the yellow paper to become pure white. High contrast is mandatory for good OCR).*

10. Best Practices

  • Binarization: The absolute best way to guarantee high OCR accuracy is to convert your image to pure black and pure white (Binary) before running the engine. Eliminate all grays and shadows so the text shapes are perfectly distinct.

11. Common Mistakes

  • Relying on OCR for perfection: OCR is rarely 100% accurate. An "O" (the letter) and a "0" (the number) look mathematically identical in many fonts. Always pair OCR output with basic NLP spell-checking logic to clean up the engine's mistakes.

12. Exercises

  1. 1. Explain how OCR acts as the technological bridge between Computer Vision (CV) and Natural Language Processing (NLP).

13. Coding Challenges

Challenge 1: Write pseudocode for a parking garage system that logs the license plate of every car that enters.
text
1234567891011121314
camera_frame = capture_car_entering()

// Step 1: Object Detection to find the plate
plate_bounding_box = AI.detect_license_plate(camera_frame)
cropped_plate_image = crop_image(camera_frame, plate_bounding_box)

// Step 2: Preprocess for OCR
binary_plate = apply_high_contrast_threshold(cropped_plate_image)

// Step 3: OCR Extraction
plate_string = run_tesseract_ocr(binary_plate)

save_to_database(time_entered, plate_string)
Print "Welcome, vehicle " + plate_string

14. MCQs with Answers

Question 1

What is the primary function of Optical Character Recognition (OCR)?

Question 2

Why is "Text Detection" a necessary first step before "Text Recognition" in modern OCR systems?

15. Interview Questions

  • Q: Walk me through the Image Processing steps you would take to prepare a poorly lit, low-contrast photograph of a document for OCR extraction.
  • Q: Describe how OCR is utilized in automated license plate reading (ALPR) systems.

16. FAQs

Q: Can OCR read handwriting? A: Yes, but with varying success. Traditional OCR engines like Tesseract struggle heavily with cursive. Modern Cloud APIs (like Google Cloud Vision or AWS Textract) use massive deep learning models specifically trained on millions of handwriting samples and are incredibly accurate.

17. Summary

In Chapter 9, we learned how computers read the physical world. Optical Character Recognition (OCR) is the process of detecting the location of text in an image and classifying those pixel shapes into digital strings. By applying heavy preprocessing to increase contrast, we can use OCR to automate data entry, digitize historical archives, and power real-time translation apps.

18. Next Chapter Recommendation

We know how to classify letters, but how do we classify massive, complex images like "Dog" vs "Cat"? Proceed to Chapter 10: Image Classification Basics to dive into the core task of deep learning vision.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·