CHAPTER 12
Beginner
Prompting for Data Analysis and Research
Updated: May 14, 2026
25 min read
# CHAPTER 12
Prompting for Data Analysis and Research
1. Introduction
Historically, if you wanted to find a specific trend in a 50-page PDF or a massive Excel spreadsheet, you had to read it manually or write complex SQL queries. Today, LLMs can instantly ingest massive documents, extract key data points, and identify hidden trends. In this chapter, we will explore how to use Prompt Engineering to turn an AI into a tireless, hyper-accurate research assistant.2. Learning Objectives
By the end of this chapter, you will be able to:- Use Prompts for rapid document summarization.
- Engineer prompts to extract specific data from messy text.
- Utilize AI for sentiment analysis and trend identification.
- Understand the limits of AI math in data analysis.
3. Beginner-Friendly Explanation
Imagine walking into a massive library to research the American Civil War. Without AI, you have to read 20 books to find the exact date and location of one specific battle. With AI Data Analysis, you hand all 20 books to a digital librarian and say, *"Read all of these right now. Hand me back a single piece of paper with a list of every battle, the date it happened, and who won, formatted as a table."* The digital librarian does it in 3 seconds. By engineering the prompt to extract specific data, you bypass the reading process entirely and jump straight to the analysis.4. The Summarization Prompt
Summarization is the most common use case for AI research. However, a vague *"Summarize this"* prompt yields useless results. You must define the *perspective* of the summary.Engineered Summarization Prompt:
text
5. Data Extraction (Messy to Clean)
AI is brilliant at taking messy, unstructured text (like a scrambled email) and converting it into clean, structured data (like a spreadsheet or JSON).The Extraction Prompt:
text
6. Sentiment Analysis and Trend Spotting
If a business receives 5,000 customer reviews, no human can read them all. You can prompt the AI to find the hidden trends.The Trend Prompt:
text
7. The Math Limitation (Advanced Data Analysis)
*Warning:* Standard LLMs (like the text box in basic ChatGPT) cannot do math reliably. If you give an LLM a spreadsheet and say, *"Calculate the average revenue,"* it will guess and hallucinate. *The Solution:* Modern tools (like ChatGPT Advanced Data Analysis) use an LLM that writes a Python script behind the scenes, executes the math using Python, and gives you the factual result. Always use "Code Interpreter" or specialized data tools when dealing with heavy numbers.8. Python Example: Batch Processing Reviews
Developers use APIs to analyze thousands of data points automatically.
python
9. Mini Project
The Legal Assistant: You are given a 30-page End User License Agreement (EULA). Write a prompt to find out if the company is allowed to sell your personal data. *(Answer Example: "Role: Privacy Lawyer. Task: Read the attached EULA document. Tell me specifically if the company has the right to sell or share my personal data with third parties. Cite the exact section and paragraph number where this is mentioned.")*10. Best Practices
- Citations Required: When prompting an AI to research a long document, always add the constraint: *"Provide exact quotes from the text to support your claims."* This prevents the AI from hallucinating a summary that sounds good but isn't actually in the document.
11. Common Mistakes
- Exceeding the Context Window: If you paste a 300-page book into a standard LLM prompt, it will crash or "forget" the middle of the book. For massive research, you must use RAG (Retrieval-Augmented Generation) or an AI tool specifically designed for huge context windows (like Claude 3).
12. Exercises
- 1. Explain why you should explicitly instruct the AI to "Ignore" irrelevant information when asking it to summarize a large document.
13. MCQs with Answers
Question 1
When extracting data from unstructured text, which prompt constraint ensures the AI handles missing information predictably?
Question 2
Why is a standard LLM (without a code interpreter) highly dangerous to use for calculating the statistical average of a 10,000-row financial spreadsheet?
14. Interview Questions
- Q: How do you engineer a prompt to extract highly structured JSON data from an unstructured, messy PDF document?
- Q: Discuss the limitations of standard Large Language Models in performing complex arithmetic on datasets, and how modern platforms overcome this limitation.