Computer Vision Comprehensive Quiz & Projects

30 questions on Computer Vision Tutorial.

Question 1: What does a Convolutional layer do in a CNN (Convolutional Neural Network)?

A. Flattens spatial multidimensional input into a 1D vector.
B. Scans input images with mathematical kernel filters to extract features like edges, textures, and shapes. — (correct answer)
C. Optimizes the learning rate of the backpropagation algorithm.
D. Encrypts pixel arrays to prevent data tampering.

Explanation: Convolutional layers apply filters to inputs to compute feature maps, preserving spatial relationships in images.

Question 2: What is the primary function of Max Pooling layers in image processing?

A. To increase image resolution through interpolation.
B. To reduce the spatial size of feature maps, minimizing parameter count and preventing overfitting. — (correct answer)
C. To invert colors to highlight hidden edges.
D. To balance color brightness values across channels.

Explanation: Max pooling extracts the maximum value in a window, downsampling the feature map to achieve translation invariance and save compute.

Question 3: How does a Sobel Filter calculate edges in an image?

A. By taking the average color of surrounding pixels.
B. By computing the intensity gradient of the image at each pixel in horizontal and vertical directions. — (correct answer)
C. By comparing the image with public dataset templates.
D. By converting the image into a frequency domain using Fourier transforms.

Explanation: The Sobel operator uses convolution kernels to approximate the horizontal and vertical derivatives of image intensity.

Question 4: Why is Data Augmentation (e.g., flipping, rotation) widely used in training computer vision models?

A. To speed up training times on local GPUs.
B. To artificially expand the training dataset diversity, improving model generalization and robustness. — (correct answer)
C. To reduce the pixel resolution to speed up backpropagation.
D. To convert grayscale images into RGB.

Explanation: Data augmentation exposes the model to variations in orientation, lighting, and zoom, preventing memorization of training layouts.

Question 5: What is the difference between Object Detection and Semantic Segmentation?

A. Object detection identifies categories, while semantic segmentation handles regression.
B. Object detection locates items using bounding boxes, while semantic segmentation classifies every single pixel in the image. — (correct answer)
C. Object detection works on videos, while semantic segmentation works only on static photos.
D. Object detection requires manual labeling, while semantic segmentation is unsupervised.

Explanation: Semantic segmentation provides fine-grained, pixel-level classifications, while object detection draws boxes around distinct entities.

Question 6: In computer vision, how is a digital color image represented in computer memory?

A. As a single long string of base64 text.
B. As a 3D numerical matrix of pixels with width, height, and color channels (usually Red, Green, Blue). — (correct answer)
C. As an index listing coordinates.
D. As an XML structured layout.

Explanation: Color images are represented as tensors with three dimensions: height, width, and channels (RGB).

Question 7: What is the purpose of converting an image from RGB to Grayscale?

A. To increase color saturation.
B. To reduce computational complexity by simplifying a 3-channel image to a single intensity channel. — (correct answer)
C. To make the image look vintage.
D. To encrypt the image.

Explanation: Grayscale processing reduces the data channels from three to one, speeding up edge-detection algorithms.

Question 8: How does the Canny Edge Detector reduce noise before identifying edges?

A. By resizing the image.
B. By applying a Gaussian filter to smooth the image and suppress minor details. — (correct answer)
C. By inverting the color spectrum.
D. By converting the image to binary format.

Explanation: The Canny detector starts with Gaussian smoothing, followed by gradient search and non-maximum suppression.

Question 9: Which OpenCV function reads an image file from disk?

A. cv2.read()
B. cv2.imread() — (correct answer)
C. cv2.load()
D. cv2.open()

Explanation: cv2.imread(path) reads image file structures, loading them into NumPy arrays.

Question 10: What is the purpose of image Thresholding?

A. Rotating the image by 90 degrees.
B. Converting a grayscale image into a binary image (black and white) by comparing pixel intensities against a cutoff value. — (correct answer)
C. Multiplying pixel values to increase brightness.
D. Compressing the image size.

Explanation: Thresholding isolates objects of interest from backgrounds, mapping pixels to 0 or 255.

Question 11: What is the core architecture difference between AlexNet and ResNet?

A. AlexNet is deep, while ResNet is shallow.
B. ResNet introduces skip connections (residual blocks) that bypass layers, preventing the vanishing gradient problem in extremely deep networks. — (correct answer)
C. AlexNet works only on black and white images.
D. ResNet does not use convolutional layers.

Explanation: Skip connections allow gradients to flow directly back through deep networks, enabling thousands of layers.

Question 12: What does 'Pixel' stand for?

A. Picture Element — (correct answer)
B. Processing Element
C. Program Entry
D. Position Entity

Explanation: A pixel is the basic logical unit of programmable color in a digital image.

Question 13: In image processing, what is a 'Kernel'?

A. The core operating system of the GPU.
B. A small matrix (e.g. 3x3) of numbers used in convolution operations to apply effects like blurring or sharpening. — (correct answer)
C. The image storage header.
D. A compression codec.

Explanation: Kernels slide over images, computing dot products to yield new pixel values in feature maps.

Question 14: What is the difference between image classification and image localization?

A. Classification identifies if an object is present, while localization draws a bounding box around the detected object. — (correct answer)
B. Localization works only on videos.
C. Classification is unsupervised, while localization is supervised.
D. There is no difference.

Explanation: Classification answers 'what'; localization answers 'what' and 'where' (bounding box coordinates).

Question 15: What color space model is designed based on Hue, Saturation, and Value/Brightness?

A. RGB
B. HSV — (correct answer)
C. CMYK
D. YUV

Explanation: HSV decouples color intensity (Value) from chromaticity (Hue, Saturation), making it ideal for color filtering.

Question 16: Which OpenCV function displays an image in a window?

A. cv2.show()
B. cv2.imshow() — (correct answer)
C. cv2.display()
D. cv2.view()

Explanation: cv2.imshow(window_name, image_array) renders image arrays to system monitors.

Question 17: What does Mean Average Precision (mAP) measure in object detection?

A. The processing speed of the camera stream.
B. A standard metric evaluating the accuracy of bounding boxes and class predictions across IoU thresholds. — (correct answer)
C. The image compression ratio.
D. The color brightness average.

Explanation: mAP averages precision scores across all categories, evaluating localization and classification quality.

Question 18: What does Intersection over Union (IoU) evaluate?

A. The connection speed of servers.
B. The overlap accuracy of a predicted bounding box compared to the ground-truth bounding box. — (correct answer)
C. The ratio of colors in an image.
D. The resolution density.

Explanation: IoU divides the overlap area by the total combined area of predicted and real boxes.

Question 19: What is the purpose of Non-Maximum Suppression (NMS) in object detection algorithms?

A. Removing background colors.
B. Eliminating redundant, overlapping bounding boxes for the same object, keeping only the box with the highest confidence score. — (correct answer)
C. Resizing bounding boxes to standard sizes.
D. Speeding up GPU backpropagation.

Explanation: NMS compares boxes matching the same label, filtering out candidates whose IoU overlaps exceed threshold values.

Question 20: Which library is a popular scientific computing package representing images as multidimensional arrays?

A. Pandas
B. NumPy — (correct answer)
C. requests
D. Jinja2

Explanation: NumPy arrays represent matrices, making numerical pixel calculations extremely fast.

Question 21: What does the HSV color model's 'Hue' dimension represent?

A. The purity of the color.
B. The base color itself, expressed as an angle from 0 to 360 degrees. — (correct answer)
C. The brightness of the color.
D. The opacity of the color.

Explanation: Hue maps colors on a wheel (e.g. Red is 0, Green is 120, Blue is 240).

Question 22: In CNNs, what does a 1x1 Convolution accomplish?

A. It resizes image width and height to 1 pixel.
B. It performs channel-wise pooling, reducing the number of feature channels (depth) without altering spatial resolution. — (correct answer)
C. It acts as a static identity matrix.
D. It resets weight parameters.

Explanation: 1x1 convolutions act as projection layers, reducing channel dimensions to save computation.

Question 23: What is a grayscale image pixel range in 8-bit representation?

A. 0 to 100
B. 0 to 255 — (correct answer)
C. -128 to 127
D. 0 to 1

Explanation: 8-bit intensity ranges from 0 (pure black) to 255 (pure white).

Question 24: What does the 'stride' parameter define in a convolutional layer?

A. The learning rate of the optimizer.
B. The step size or pixel jump the kernel filter takes when scanning across the input image. — (correct answer)
C. The thickness of the border lines.
D. The batch size of image inputs.

Explanation: Higher strides downsample output sizes (e.g., stride 2 halves output dimensions).

Question 25: How does Instance Segmentation differ from Semantic Segmentation?

A. Instance segmentation works only in real-time video feeds.
B. Semantic segmentation labels pixels by class category, while Instance segmentation distinguishes between individual objects of the same class. — (correct answer)
C. Instance segmentation does not classify pixels.
D. There is no difference.

Explanation: Semantic segmentation groups all 'cars' in one color; Instance segmentation colors each distinct 'car' differently.

Question 26: Which function saves an image array to a file in OpenCV?

A. cv2.save()
B. cv2.imwrite() — (correct answer)
C. cv2.export()
D. cv2.write()

Explanation: cv2.imwrite(path, image_array) encodes and saves images in PNG, JPEG, or other formats.

Question 27: What does a 'Histogram' of an image show?

A. The history of modifications made to the file.
B. The distribution of pixel intensity values, displaying the count of pixels at each gray/color level. — (correct answer)
C. The coordinates of bounding boxes.
D. The configuration properties of the camera.

Explanation: Histograms analyze image contrast, helping engineers optimize brightness distributions.

Question 28: What is the difference between a Pooling layer and a Convolutional layer?

A. Pooling layers have trainable weights, while Convolutional layers do not.
B. Convolutional layers extract features using weights, while Pooling layers downsample maps using static formulas (like Max or Average). — (correct answer)
C. Pooling is used only for text classification.
D. There is no difference.

Explanation: Pooling layers contain no trainable weights, reducing spatial dimensions mathematically.

Question 29: Why is padding (e.g., 'same' padding) applied to images before convolutions?

A. To decrease image brightness.
B. To prevent the spatial dimensions of the feature map from shrinking after scanning with a kernel. — (correct answer)
C. To encrypt the image border pixels.
D. To increase the processing speed.

Explanation: Padding adds borders (usually zeros) so edge pixels are scanned equally, preserving dimensions.

Question 30: What is the function of the cv2.waitKey(0) statement?

A. Pauses CPU thread runs for 10 seconds.
B. Suspends execution window rendering, waiting indefinitely until a keyboard key is pressed. — (correct answer)
C. Records video frames.
D. Closes the terminal session.

Explanation: waitKey(0) holds windows open, closing them only when user input is registered.

Computer Vision Comprehensive Quiz & Projects

Test your knowledge with interactive questions.

Ready to test your skills?

You are about to start a comprehensive quiz containing questions covering Computer Vision Tutorial. You have 30 minutes to complete it.

Discussion (0)

Comments are reviewed before appearing.

No comments yet — be the first!

Featured

Browse All 21+ Subject Areas

Popular Topics

More Topics

Quick Links

Featured

Visual Algorithm Labs

Sorting Algorithms

Data Structures

Featured

Frontend Dev

Career Paths

Skill Tracks

Featured

The Future of Web Architecture in 2026

Categories

Community

Practice Quizzes

Computer Vision Comprehensive Quiz & Projects

Ready to test your skills?

Correct!

Incorrect

Explanation

Quiz Navigator /

Current Score

Discussion (0)

Send Feedback / Bug

Feedback Submitted!