Computer Vision Comprehensive Quiz & Projects
30 questions on Computer Vision Tutorial.
Question 1: What does a Convolutional layer do in a CNN (Convolutional Neural Network)?
- A. Flattens spatial multidimensional input into a 1D vector.
- B. Scans input images with mathematical kernel filters to extract features like edges, textures, and shapes. β (correct answer)
- C. Optimizes the learning rate of the backpropagation algorithm.
- D. Encrypts pixel arrays to prevent data tampering.
Explanation: Convolutional layers apply filters to inputs to compute feature maps, preserving spatial relationships in images.
Question 2: What is the primary function of Max Pooling layers in image processing?
- A. To increase image resolution through interpolation.
- B. To reduce the spatial size of feature maps, minimizing parameter count and preventing overfitting. β (correct answer)
- C. To invert colors to highlight hidden edges.
- D. To balance color brightness values across channels.
Explanation: Max pooling extracts the maximum value in a window, downsampling the feature map to achieve translation invariance and save compute.
Question 3: How does a Sobel Filter calculate edges in an image?
- A. By taking the average color of surrounding pixels.
- B. By computing the intensity gradient of the image at each pixel in horizontal and vertical directions. β (correct answer)
- C. By comparing the image with public dataset templates.
- D. By converting the image into a frequency domain using Fourier transforms.
Explanation: The Sobel operator uses convolution kernels to approximate the horizontal and vertical derivatives of image intensity.
Question 4: Why is Data Augmentation (e.g., flipping, rotation) widely used in training computer vision models?
- A. To speed up training times on local GPUs.
- B. To artificially expand the training dataset diversity, improving model generalization and robustness. β (correct answer)
- C. To reduce the pixel resolution to speed up backpropagation.
- D. To convert grayscale images into RGB.
Explanation: Data augmentation exposes the model to variations in orientation, lighting, and zoom, preventing memorization of training layouts.
Question 5: What is the difference between Object Detection and Semantic Segmentation?
- A. Object detection identifies categories, while semantic segmentation handles regression.
- B. Object detection locates items using bounding boxes, while semantic segmentation classifies every single pixel in the image. β (correct answer)
- C. Object detection works on videos, while semantic segmentation works only on static photos.
- D. Object detection requires manual labeling, while semantic segmentation is unsupervised.
Explanation: Semantic segmentation provides fine-grained, pixel-level classifications, while object detection draws boxes around distinct entities.
Question 6: In computer vision, how is a digital color image represented in computer memory?
- A. As a single long string of base64 text.
- B. As a 3D numerical matrix of pixels with width, height, and color channels (usually Red, Green, Blue). β (correct answer)
- C. As an index listing coordinates.
- D. As an XML structured layout.
Explanation: Color images are represented as tensors with three dimensions: height, width, and channels (RGB).
Question 7: What is the purpose of converting an image from RGB to Grayscale?
- A. To increase color saturation.
- B. To reduce computational complexity by simplifying a 3-channel image to a single intensity channel. β (correct answer)
- C. To make the image look vintage.
- D. To encrypt the image.
Explanation: Grayscale processing reduces the data channels from three to one, speeding up edge-detection algorithms.
Question 8: How does the Canny Edge Detector reduce noise before identifying edges?
- A. By resizing the image.
- B. By applying a Gaussian filter to smooth the image and suppress minor details. β (correct answer)
- C. By inverting the color spectrum.
- D. By converting the image to binary format.
Explanation: The Canny detector starts with Gaussian smoothing, followed by gradient search and non-maximum suppression.
Question 9: Which OpenCV function reads an image file from disk?
- A. cv2.read()
- B. cv2.imread() β (correct answer)
- C. cv2.load()
- D. cv2.open()
Explanation: cv2.imread(path) reads image file structures, loading them into NumPy arrays.
Question 10: What is the purpose of image Thresholding?
- A. Rotating the image by 90 degrees.
- B. Converting a grayscale image into a binary image (black and white) by comparing pixel intensities against a cutoff value. β (correct answer)
- C. Multiplying pixel values to increase brightness.
- D. Compressing the image size.
Explanation: Thresholding isolates objects of interest from backgrounds, mapping pixels to 0 or 255.
Question 11: What is the core architecture difference between AlexNet and ResNet?
- A. AlexNet is deep, while ResNet is shallow.
- B. ResNet introduces skip connections (residual blocks) that bypass layers, preventing the vanishing gradient problem in extremely deep networks. β (correct answer)
- C. AlexNet works only on black and white images.
- D. ResNet does not use convolutional layers.
Explanation: Skip connections allow gradients to flow directly back through deep networks, enabling thousands of layers.
Question 12: What does 'Pixel' stand for?
- A. Picture Element β (correct answer)
- B. Processing Element
- C. Program Entry
- D. Position Entity
Explanation: A pixel is the basic logical unit of programmable color in a digital image.
Question 13: In image processing, what is a 'Kernel'?
- A. The core operating system of the GPU.
- B. A small matrix (e.g. 3x3) of numbers used in convolution operations to apply effects like blurring or sharpening. β (correct answer)
- C. The image storage header.
- D. A compression codec.
Explanation: Kernels slide over images, computing dot products to yield new pixel values in feature maps.
Question 14: What is the difference between image classification and image localization?
- A. Classification identifies if an object is present, while localization draws a bounding box around the detected object. β (correct answer)
- B. Localization works only on videos.
- C. Classification is unsupervised, while localization is supervised.
- D. There is no difference.
Explanation: Classification answers 'what'; localization answers 'what' and 'where' (bounding box coordinates).
Question 15: What color space model is designed based on Hue, Saturation, and Value/Brightness?
- A. RGB
- B. HSV β (correct answer)
- C. CMYK
- D. YUV
Explanation: HSV decouples color intensity (Value) from chromaticity (Hue, Saturation), making it ideal for color filtering.
Question 16: Which OpenCV function displays an image in a window?
- A. cv2.show()
- B. cv2.imshow() β (correct answer)
- C. cv2.display()
- D. cv2.view()
Explanation: cv2.imshow(window_name, image_array) renders image arrays to system monitors.
Question 17: What does Mean Average Precision (mAP) measure in object detection?
- A. The processing speed of the camera stream.
- B. A standard metric evaluating the accuracy of bounding boxes and class predictions across IoU thresholds. β (correct answer)
- C. The image compression ratio.
- D. The color brightness average.
Explanation: mAP averages precision scores across all categories, evaluating localization and classification quality.
Question 18: What does Intersection over Union (IoU) evaluate?
- A. The connection speed of servers.
- B. The overlap accuracy of a predicted bounding box compared to the ground-truth bounding box. β (correct answer)
- C. The ratio of colors in an image.
- D. The resolution density.
Explanation: IoU divides the overlap area by the total combined area of predicted and real boxes.
Question 19: What is the purpose of Non-Maximum Suppression (NMS) in object detection algorithms?
- A. Removing background colors.
- B. Eliminating redundant, overlapping bounding boxes for the same object, keeping only the box with the highest confidence score. β (correct answer)
- C. Resizing bounding boxes to standard sizes.
- D. Speeding up GPU backpropagation.
Explanation: NMS compares boxes matching the same label, filtering out candidates whose IoU overlaps exceed threshold values.
Question 20: Which library is a popular scientific computing package representing images as multidimensional arrays?
- A. Pandas
- B. NumPy β (correct answer)
- C. requests
- D. Jinja2
Explanation: NumPy arrays represent matrices, making numerical pixel calculations extremely fast.
Question 21: What does the HSV color model's 'Hue' dimension represent?
- A. The purity of the color.
- B. The base color itself, expressed as an angle from 0 to 360 degrees. β (correct answer)
- C. The brightness of the color.
- D. The opacity of the color.
Explanation: Hue maps colors on a wheel (e.g. Red is 0, Green is 120, Blue is 240).
Question 22: In CNNs, what does a 1x1 Convolution accomplish?
- A. It resizes image width and height to 1 pixel.
- B. It performs channel-wise pooling, reducing the number of feature channels (depth) without altering spatial resolution. β (correct answer)
- C. It acts as a static identity matrix.
- D. It resets weight parameters.
Explanation: 1x1 convolutions act as projection layers, reducing channel dimensions to save computation.
Question 23: What is a grayscale image pixel range in 8-bit representation?
- A. 0 to 100
- B. 0 to 255 β (correct answer)
- C. -128 to 127
- D. 0 to 1
Explanation: 8-bit intensity ranges from 0 (pure black) to 255 (pure white).
Question 24: What does the 'stride' parameter define in a convolutional layer?
- A. The learning rate of the optimizer.
- B. The step size or pixel jump the kernel filter takes when scanning across the input image. β (correct answer)
- C. The thickness of the border lines.
- D. The batch size of image inputs.
Explanation: Higher strides downsample output sizes (e.g., stride 2 halves output dimensions).
Question 25: How does Instance Segmentation differ from Semantic Segmentation?
- A. Instance segmentation works only in real-time video feeds.
- B. Semantic segmentation labels pixels by class category, while Instance segmentation distinguishes between individual objects of the same class. β (correct answer)
- C. Instance segmentation does not classify pixels.
- D. There is no difference.
Explanation: Semantic segmentation groups all 'cars' in one color; Instance segmentation colors each distinct 'car' differently.
Question 26: Which function saves an image array to a file in OpenCV?
- A. cv2.save()
- B. cv2.imwrite() β (correct answer)
- C. cv2.export()
- D. cv2.write()
Explanation: cv2.imwrite(path, image_array) encodes and saves images in PNG, JPEG, or other formats.
Question 27: What does a 'Histogram' of an image show?
- A. The history of modifications made to the file.
- B. The distribution of pixel intensity values, displaying the count of pixels at each gray/color level. β (correct answer)
- C. The coordinates of bounding boxes.
- D. The configuration properties of the camera.
Explanation: Histograms analyze image contrast, helping engineers optimize brightness distributions.
Question 28: What is the difference between a Pooling layer and a Convolutional layer?
- A. Pooling layers have trainable weights, while Convolutional layers do not.
- B. Convolutional layers extract features using weights, while Pooling layers downsample maps using static formulas (like Max or Average). β (correct answer)
- C. Pooling is used only for text classification.
- D. There is no difference.
Explanation: Pooling layers contain no trainable weights, reducing spatial dimensions mathematically.
Question 29: Why is padding (e.g., 'same' padding) applied to images before convolutions?
- A. To decrease image brightness.
- B. To prevent the spatial dimensions of the feature map from shrinking after scanning with a kernel. β (correct answer)
- C. To encrypt the image border pixels.
- D. To increase the processing speed.
Explanation: Padding adds borders (usually zeros) so edge pixels are scanned equally, preserving dimensions.
Question 30: What is the function of the cv2.waitKey(0) statement?
- A. Pauses CPU thread runs for 10 seconds.
- B. Suspends execution window rendering, waiting indefinitely until a keyboard key is pressed. β (correct answer)
- C. Records video frames.
- D. Closes the terminal session.
Explanation: waitKey(0) holds windows open, closing them only when user input is registered.