CHAPTER 13
Beginner
AI Image Prompt Engineering
Updated: May 14, 2026
20 min read
# CHAPTER 13
AI Image Prompt Engineering
1. Introduction
Prompt Engineering for text (ChatGPT) focuses on logic, tone, and formatting. Prompt Engineering for images (Midjourney, DALL-E 3) requires an entirely different vocabulary. You are no longer acting as a writer; you are acting as a Director of Photography. In this chapter, we will learn the specific syntax required to generate stunning, photorealistic images and digital art by commanding the camera, lighting, and style.2. Learning Objectives
By the end of this chapter, you will be able to:- Understand the difference between Text Prompts and Image Prompts.
- Structure an Image Prompt using the Subject-Environment-Camera framework.
- Master vocabulary for lighting, art styles, and rendering.
- Control the mood and composition of generated imagery.
3. Beginner-Friendly Explanation
Imagine describing a dream to a blindfolded painter. If you say, *"Paint a dog,"* the painter might paint a cartoon dog, a watercolor dog, or a terrifying realistic wolf. You left the details to chance. If you say, *"Paint a golden retriever puppy. It is sitting in a sunlit park. The style is hyper-realistic photography. The camera is a 35mm lens, blurring the background. The lighting is golden hour."* The painter knows exactly what to do. Image Prompting is the act of meticulously describing the physical, visual, and cinematic properties of an image to a Diffusion Model.4. The Anatomy of an Image Prompt
A professional image prompt contains several distinct layers, usually separated by commas:- 1. The Subject: What is the focus? (e.g., A cyborg woman, a cup of coffee).
- 2. The Action/Environment: What is happening? Where is it? (e.g., sitting in a neon-lit Tokyo diner).
- 3. The Medium/Style: Is it a photo, an oil painting, 3D art, or anime? (e.g., Cinematic photography, Impressionist painting).
- 4. The Camera & Lighting: (e.g., 85mm lens, macro shot, cinematic lighting, volumetric fog).
- 5. Quality Boosters: (e.g., 8k resolution, highly detailed, Unreal Engine 5).
5. Mastering Visual Vocabulary
To get good images, you must learn the vocabulary of photographers and artists:- Lighting: *Cinematic lighting, Golden hour, Studio lighting, Neon glow, Volumetric rays.*
- Camera Angles: *Extreme close-up, Wide shot, Drone view, Fish-eye lens, Low angle.*
- Art Styles: *Cyberpunk, Steampunk, Watercolor, Vector illustration, 3D render, Pixar style, Vintage 1950s poster.*
6. Prompt Example: Good vs. Bad
Bad Prompt (Vague):
text
*Output:* A generic, blurry, video-game style car.
Engineered Prompt (Cinematic):
text
*Output:* A breathtaking, movie-quality photograph that looks completely real.
7. Platform Differences (DALL-E vs. Midjourney)
- DALL-E 3 (OpenAI): Understands natural conversational English perfectly. You can say, "Make a funny comic about a dog," and it works perfectly. It is highly accurate at rendering exact text and signs.
-
Midjourney: The industry standard for pure artistic beauty. However, it requires highly specific, comma-separated keywords and uses specific parameters at the end of the prompt (e.g.,
--ar 16:9to make the image widescreen, or--v 6.0to use the newest model).
8. Text-to-Image API Example
Developers can generate images programmatically using the OpenAI API.
python
9. Mini Project
The Style Swapper: Take a single subject: *"A cat sitting on a chair."* Write three completely different image prompts changing *only* the Medium and Style keywords to produce:- 1. A photograph.
- 2. An oil painting.
- 3. A 3D cartoon.
10. Best Practices
- Use ChatGPT as a Prompt Writer: If you struggle to think of camera angles, ask ChatGPT to write the Midjourney prompt for you! *Prompt: "Act as an expert photographer. Write a highly detailed, 50-word Midjourney image prompt for a picture of a mountain. Include lens type and lighting."*
11. Common Mistakes
- Overcrowding the Subject: Do not ask an image generator to draw 15 different people doing 15 different things in one image. Diffusion models struggle with spatial composition. Stick to 1 or 2 main subjects for the highest quality.
12. Exercises
- 1. Explain why using keywords like "85mm lens" or "Volumetric lighting" drastically improves the quality of an AI-generated image.
13. MCQs with Answers
Question 1
What is the recommended structure for a professional AI Image Prompt?
Question 2
If you want to generate an image that looks like a frame from a high-budget Hollywood movie, which keyword should you include?
14. Interview Questions
- Q: Contrast the prompt engineering approach required for an LLM (Text) versus a Diffusion Model (Images). What specific vocabularies must a prompt engineer master for images?
- Q: Explain how you would utilize an LLM (like ChatGPT) in a chained workflow to generate superior prompts for an Image Generator (like Midjourney).