Skip to main content
Prompt Engineering Tutorial
CHAPTER 13 Beginner

AI Image Prompt Engineering

Updated: May 14, 2026
20 min read

# CHAPTER 13

AI Image Prompt Engineering

1. Introduction

Prompt Engineering for text (ChatGPT) focuses on logic, tone, and formatting. Prompt Engineering for images (Midjourney, DALL-E 3) requires an entirely different vocabulary. You are no longer acting as a writer; you are acting as a Director of Photography. In this chapter, we will learn the specific syntax required to generate stunning, photorealistic images and digital art by commanding the camera, lighting, and style.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Understand the difference between Text Prompts and Image Prompts.
  • Structure an Image Prompt using the Subject-Environment-Camera framework.
  • Master vocabulary for lighting, art styles, and rendering.
  • Control the mood and composition of generated imagery.

3. Beginner-Friendly Explanation

Imagine describing a dream to a blindfolded painter. If you say, *"Paint a dog,"* the painter might paint a cartoon dog, a watercolor dog, or a terrifying realistic wolf. You left the details to chance. If you say, *"Paint a golden retriever puppy. It is sitting in a sunlit park. The style is hyper-realistic photography. The camera is a 35mm lens, blurring the background. The lighting is golden hour."* The painter knows exactly what to do. Image Prompting is the act of meticulously describing the physical, visual, and cinematic properties of an image to a Diffusion Model.

4. The Anatomy of an Image Prompt

A professional image prompt contains several distinct layers, usually separated by commas:
  1. 1. The Subject: What is the focus? (e.g., A cyborg woman, a cup of coffee).
  1. 2. The Action/Environment: What is happening? Where is it? (e.g., sitting in a neon-lit Tokyo diner).
  1. 3. The Medium/Style: Is it a photo, an oil painting, 3D art, or anime? (e.g., Cinematic photography, Impressionist painting).
  1. 4. The Camera & Lighting: (e.g., 85mm lens, macro shot, cinematic lighting, volumetric fog).
  1. 5. Quality Boosters: (e.g., 8k resolution, highly detailed, Unreal Engine 5).

5. Mastering Visual Vocabulary

To get good images, you must learn the vocabulary of photographers and artists:
  • Lighting: *Cinematic lighting, Golden hour, Studio lighting, Neon glow, Volumetric rays.*
  • Camera Angles: *Extreme close-up, Wide shot, Drone view, Fish-eye lens, Low angle.*
  • Art Styles: *Cyberpunk, Steampunk, Watercolor, Vector illustration, 3D render, Pixar style, Vintage 1950s poster.*

6. Prompt Example: Good vs. Bad

Bad Prompt (Vague):
text
1
A cool futuristic car.

*Output:* A generic, blurry, video-game style car.

Engineered Prompt (Cinematic):

text
123
A sleek futuristic hover-car driving down a rain-slicked cyberpunk street at night. 
Cinematic photography, neon pink and cyan lighting reflecting in puddles. 
Shot on 35mm lens, depth of field, photorealistic, 8k resolution.

*Output:* A breathtaking, movie-quality photograph that looks completely real.

7. Platform Differences (DALL-E vs. Midjourney)

  • DALL-E 3 (OpenAI): Understands natural conversational English perfectly. You can say, "Make a funny comic about a dog," and it works perfectly. It is highly accurate at rendering exact text and signs.
  • Midjourney: The industry standard for pure artistic beauty. However, it requires highly specific, comma-separated keywords and uses specific parameters at the end of the prompt (e.g., --ar 16:9 to make the image widescreen, or --v 6.0 to use the newest model).

8. Text-to-Image API Example

Developers can generate images programmatically using the OpenAI API.
python
1234567891011121314
import openai
client = openai.OpenAI()

prompt = "A futuristic cyberpunk city skyline, cinematic lighting, 4k"

response = client.images.generate(
  model="dall-e-3",
  prompt=prompt,
  size="1024x1024",
  quality="hd",
  n=1,
)

print(f"Image URL: {response.data[0].url}")

9. Mini Project

The Style Swapper: Take a single subject: *"A cat sitting on a chair."* Write three completely different image prompts changing *only* the Medium and Style keywords to produce:
  1. 1. A photograph.
  1. 2. An oil painting.
  1. 3. A 3D cartoon.
*(Answer Example: 1. "A cat sitting on a chair, photorealistic 85mm portrait, studio lighting." 2. "A cat sitting on a chair, impressionist oil painting, brush strokes, Van Gogh style." 3. "A cat sitting on a chair, 3D Pixar animation style, vibrant colors, soft lighting.")*

10. Best Practices

  • Use ChatGPT as a Prompt Writer: If you struggle to think of camera angles, ask ChatGPT to write the Midjourney prompt for you! *Prompt: "Act as an expert photographer. Write a highly detailed, 50-word Midjourney image prompt for a picture of a mountain. Include lens type and lighting."*

11. Common Mistakes

  • Overcrowding the Subject: Do not ask an image generator to draw 15 different people doing 15 different things in one image. Diffusion models struggle with spatial composition. Stick to 1 or 2 main subjects for the highest quality.

12. Exercises

  1. 1. Explain why using keywords like "85mm lens" or "Volumetric lighting" drastically improves the quality of an AI-generated image.

13. MCQs with Answers

Question 1

What is the recommended structure for a professional AI Image Prompt?

Question 2

If you want to generate an image that looks like a frame from a high-budget Hollywood movie, which keyword should you include?

14. Interview Questions

  • Q: Contrast the prompt engineering approach required for an LLM (Text) versus a Diffusion Model (Images). What specific vocabularies must a prompt engineer master for images?
  • Q: Explain how you would utilize an LLM (like ChatGPT) in a chained workflow to generate superior prompts for an Image Generator (like Midjourney).

15. FAQs

Q: Can AI put exact words on an image, like a logo? A: Historically, no. Older models would hallucinate weird, alien text. However, DALL-E 3 and Midjourney v6 can now render exact text. Just put the text in quotes in your prompt: *"A neon sign that says 'OPEN LATE'."*

16. Summary

In Chapter 13, we traded our keyboards for digital cameras. Prompt engineering for images requires mastering visual vocabulary. By breaking our prompt into Subject, Environment, Style, and Lighting, we leave nothing to chance. Whether we need a photorealistic cinematic shot or a 3D animated character, applying specific camera lenses and rendering keywords forces the Diffusion model to produce professional-grade art.

17. Next Chapter Recommendation

Your prompt is good, but it can always be better. Proceed to Chapter 14: Prompt Optimization and Refinement to learn how to test and perfect your prompts.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·