Skip to main content
Prompt Engineering Tutorial
CHAPTER 12 Beginner

Prompting for Data Analysis and Research

Updated: May 14, 2026
25 min read

# CHAPTER 12

Prompting for Data Analysis and Research

1. Introduction

Historically, if you wanted to find a specific trend in a 50-page PDF or a massive Excel spreadsheet, you had to read it manually or write complex SQL queries. Today, LLMs can instantly ingest massive documents, extract key data points, and identify hidden trends. In this chapter, we will explore how to use Prompt Engineering to turn an AI into a tireless, hyper-accurate research assistant.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Use Prompts for rapid document summarization.
  • Engineer prompts to extract specific data from messy text.
  • Utilize AI for sentiment analysis and trend identification.
  • Understand the limits of AI math in data analysis.

3. Beginner-Friendly Explanation

Imagine walking into a massive library to research the American Civil War. Without AI, you have to read 20 books to find the exact date and location of one specific battle. With AI Data Analysis, you hand all 20 books to a digital librarian and say, *"Read all of these right now. Hand me back a single piece of paper with a list of every battle, the date it happened, and who won, formatted as a table."* The digital librarian does it in 3 seconds. By engineering the prompt to extract specific data, you bypass the reading process entirely and jump straight to the analysis.

4. The Summarization Prompt

Summarization is the most common use case for AI research. However, a vague *"Summarize this"* prompt yields useless results. You must define the *perspective* of the summary.

Engineered Summarization Prompt:

text
1234567
Role: Executive Business Assistant.
Task: Summarize the following 10-page Q3 Earnings Call transcript.
Constraints:
- Focus EXCLUSIVELY on mentions of our competitor, "Acme Corp".
- Ignore all financial metrics about our own company.
- Format the summary as 3 bullet points.
Transcript: """ [Paste Transcript] """

5. Data Extraction (Messy to Clean)

AI is brilliant at taking messy, unstructured text (like a scrambled email) and converting it into clean, structured data (like a spreadsheet or JSON).

The Extraction Prompt:

text
1234567
Task: Extract all dates, names, and dollar amounts from the messy text below.
Format: Output a Markdown table with the columns: [Name, Date, Amount].
Constraint: If a value is missing, write "N/A".

Text: """
Hi it's John, I paid $50 on Tuesday the 4th. Sarah didn't pay her $100 bill. Mike paid on October 12th but I forgot the amount.
"""

6. Sentiment Analysis and Trend Spotting

If a business receives 5,000 customer reviews, no human can read them all. You can prompt the AI to find the hidden trends.

The Trend Prompt:

text
123
Task: Read the following 100 customer reviews for our new coffee machine.
Identify the 3 most common complaints. Provide a direct quote from the reviews for each complaint to prove it.
Output: A bulleted list.

7. The Math Limitation (Advanced Data Analysis)

*Warning:* Standard LLMs (like the text box in basic ChatGPT) cannot do math reliably. If you give an LLM a spreadsheet and say, *"Calculate the average revenue,"* it will guess and hallucinate. *The Solution:* Modern tools (like ChatGPT Advanced Data Analysis) use an LLM that writes a Python script behind the scenes, executes the math using Python, and gives you the factual result. Always use "Code Interpreter" or specialized data tools when dealing with heavy numbers.

8. Python Example: Batch Processing Reviews

Developers use APIs to analyze thousands of data points automatically.
python
1234567891011121314
import openai
client = openai.OpenAI()

# A list of thousands of messy reviews
reviews = ["The food was cold.", "Loved the ambient lighting!", "Terrible service."]

for review in reviews:
    prompt = f"Analyze the sentiment of this review. Output exactly ONE word: Positive, Negative, or Neutral. Review: '{review}'"
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    print(f"Review: {review} -> Sentiment: {response.choices[0].message.content}")

9. Mini Project

The Legal Assistant: You are given a 30-page End User License Agreement (EULA). Write a prompt to find out if the company is allowed to sell your personal data. *(Answer Example: "Role: Privacy Lawyer. Task: Read the attached EULA document. Tell me specifically if the company has the right to sell or share my personal data with third parties. Cite the exact section and paragraph number where this is mentioned.")*

10. Best Practices

  • Citations Required: When prompting an AI to research a long document, always add the constraint: *"Provide exact quotes from the text to support your claims."* This prevents the AI from hallucinating a summary that sounds good but isn't actually in the document.

11. Common Mistakes

  • Exceeding the Context Window: If you paste a 300-page book into a standard LLM prompt, it will crash or "forget" the middle of the book. For massive research, you must use RAG (Retrieval-Augmented Generation) or an AI tool specifically designed for huge context windows (like Claude 3).

12. Exercises

  1. 1. Explain why you should explicitly instruct the AI to "Ignore" irrelevant information when asking it to summarize a large document.

13. MCQs with Answers

Question 1

When extracting data from unstructured text, which prompt constraint ensures the AI handles missing information predictably?

Question 2

Why is a standard LLM (without a code interpreter) highly dangerous to use for calculating the statistical average of a 10,000-row financial spreadsheet?

14. Interview Questions

  • Q: How do you engineer a prompt to extract highly structured JSON data from an unstructured, messy PDF document?
  • Q: Discuss the limitations of standard Large Language Models in performing complex arithmetic on datasets, and how modern platforms overcome this limitation.

15. FAQs

Q: Can I use AI to summarize a YouTube video? A: Yes! You use a simple script or a website to download the "Transcript" (subtitles) of the YouTube video as text. Then, you paste that massive block of text into your AI prompt and say, "Summarize this transcript into 5 bullet points."

16. Summary

In Chapter 12, we turned the AI into a relentless digital librarian. By aggressively defining our research goals, we can prompt LLMs to extract exact variables from messy text, summarize massive documents from highly specific perspectives, and instantly identify trends in qualitative data (like customer reviews). By understanding the AI's mathematical limitations and demanding exact citations, we ensure our research remains completely factual.

17. Next Chapter Recommendation

Text is incredibly powerful, but generative AI can also create visuals. Proceed to Chapter 13: AI Image Prompt Engineering to learn the art of prompting for pixels.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·