Skip to main content
Generative AI Tutorial
CHAPTER 04 Beginner

Understanding Large Language Models (LLMs)

Updated: May 14, 2026
20 min read

# CHAPTER 4

Understanding Large Language Models (LLMs)

1. Introduction

The engine driving the modern Generative AI revolution is the Large Language Model (LLM). Systems like GPT-4, Claude, and Gemini are all LLMs. But what exactly makes them "Large," and how do they process human language? In this chapter, we will break open the black box of an LLM, exploring critical concepts like Tokens, Context Windows, and Parameters.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define what a Large Language Model is.
  • Understand how AI breaks words into "Tokens".
  • Explain the concept and limitations of the "Context Window".
  • Comprehend what "Parameters" are and why model size matters.

3. Beginner-Friendly Explanation

Imagine a master linguist who has spent 5,000 years reading every book, article, and website ever published. If you give this linguist the sentence, *"The cat chased the..."*, they don't have to think very hard to guess that the next word is probably *"mouse"*. An LLM is a massive mathematical engine that does exactly this. It has consumed billions of pages of text, mapping the statistical probabilities of how humans use words. When you ask an LLM a question, it is essentially playing the world's most advanced, high-speed game of autocomplete, predicting the next most logical word based on everything it has ever read.

4. What makes it "Large"?

LLMs are "Large" in two ways:
  1. 1. Training Data: They are trained on a substantial portion of the entire public internet (terabytes of text).
  1. 2. Parameters: Think of a parameter as a single mathematical connection (a synapse) in the AI's artificial brain. Early models had a few million parameters. Modern models have over 1 Trillion parameters. More parameters generally mean the model is "smarter" and can understand more complex reasoning.

5. Tokens: The Currency of AI

Computers do not understand letters or words; they only understand numbers. Before an LLM reads your prompt, it chops your text into pieces called Tokens.
  • A token can be an entire word (e.g., apple).
  • A token can be a syllable or chunk of a word (e.g., Ham + bur + ger).
*Rule of thumb:* 1 Token is roughly equal to 3/4 of a standard English word (or 4 characters). When you pay for API access to OpenAI, you are billed *per token*, not per word!

6. The Context Window (Memory Limit)

An LLM does not have infinite memory. The Context Window is the maximum number of tokens the AI can "hold in its head" at one single time.
  • If a model has a Context Window of 8,000 tokens (approx. 6,000 words), and you paste a 10,000-word essay into the prompt, the AI will completely forget the first 4,000 words by the time it reaches the end!
  • Modern breakthroughs have pushed Context Windows to massive sizes. Google's Gemini 1.5 Pro boasts a context window of up to 2 Million tokens, allowing you to upload entire textbooks and hour-long videos in a single prompt.

7. JSON Example: How APIs Count Tokens

When developers send data to an LLM, the API often returns metadata showing exactly how many tokens were consumed in the transaction.
json
1234567891011121314151617
{
  "id": "chatcmpl-12345",
  "object": "chat.completion",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Paris is the capital of France."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 7,
    "total_tokens": 21
  }
}

8. Python Example: Token Estimation

If you are building an AI app, you can use the tiktoken library (OpenAI's official tokenizer) to count tokens *before* sending them to the API, ensuring you don't exceed the Context Window.
python
1234567891011121314
import tiktoken

# Load the tokenizer used by GPT-4
tokenizer = tiktoken.encoding_for_model("gpt-4")

my_prompt = "Generative AI is absolutely fascinating!"

# Chop the text into numerical tokens
tokens = tokenizer.encode(my_prompt)

print(f"The text was chopped into {len(tokens)} tokens.")
print(f"Token IDs: {tokens}")
# Output: The text was chopped into 7 tokens.
# Token IDs: [40405, 597, 15836, 374, 9821, 34241, 0]

9. Mini Project

Calculate the Limits: You are using an open-source LLM with a maximum Context Window of 4,000 tokens. You have a conversation history with the bot that is 3,500 words long. Can you ask the bot a new question and get a response without it forgetting the beginning of the conversation? *(Answer: No. 3,500 words is roughly 4,600 tokens. You have already exceeded the 4,000 token Context Window limit. The AI will begin "forgetting" the earliest messages in the chat).*

10. Best Practices

  • Mind the Context: When building chatbots, developers must write code that actively truncates or summarizes the conversation history as the chat gets longer, preventing the user from accidentally exceeding the Context Window limit and crashing the app.

11. Common Mistakes

  • Assuming LLMs read letter-by-letter: Because LLMs read "Tokens" (chunks of words), they are notoriously terrible at spelling tasks or counting specific letters. If you ask an LLM, "How many r's are in the word strawberry?", it will often guess incorrectly because it doesn't see "s-t-r-a-w-b-e-r-r-y", it sees the mathematical token ID [49832].

12. Exercises

  1. 1. Explain the difference between an LLM's "Training Data" and its "Context Window".

13. MCQs with Answers

Question 1

In the world of LLMs, what is a "Token"?

Question 2

If you paste a novel into an AI prompt that exceeds the model's Context Window limit, what will happen?

14. Interview Questions

  • Q: What is a Context Window, and what strategies would you use in software development to ensure a user's chatbot session doesn't exceed it?
  • Q: Explain why LLMs sometimes struggle with character-level manipulation (like counting specific letters in a word) due to tokenization.

15. FAQs

Q: Do LLMs actually understand what I am saying? A: Mathematically, no. An LLM has no consciousness, no emotions, and no understanding of the real world. It does not "know" what an apple tastes like. It simply knows that the mathematical token for "apple" is highly correlated with the tokens for "red", "sweet", and "fruit". It simulates understanding through staggeringly complex statistical probabilities.

16. Summary

In Chapter 4, we decoded the architecture of Large Language Models. These massive networks are trained on trillions of words, possessing billions of parameters. They do not read English; they read mathematical "Tokens." Furthermore, they are bound by the physical limits of their "Context Window," which dictates how much immediate memory they can utilize during a conversation. Understanding these mechanics is the key to mastering Generative AI.

17. Next Chapter Recommendation

We know what the model is made of, but how does it actually predict the text? Proceed to Chapter 5: How Generative AI Works to dive into the statistics of prediction.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·