Skip to main content
Data Visualization
CHAPTER 28 Beginner

Data Visualization Interview Preparation

Updated: May 18, 2026
5 min read

# CHAPTER 28

Data Visualization Interview Preparation

1. Chapter Introduction

This chapter compiles the most-asked data visualization interview questions from analytics, data science, and BI roles — covering visualization theory, chart selection, dashboard design, and practical coding challenges.

---

Section A: Theory Questions (Q1-25)

Q1. What is data visualization and why does it matter? Data visualization is the graphical representation of data enabling humans to identify patterns, trends, and outliers faster than reading tables. Humans process visuals 60,000x faster than text.

Q2. What is Edward Tufte's Data-Ink Ratio? The ratio of ink encoding data to total ink used. Maximize it by removing chart junk (decorations, unnecessary gridlines, 3D effects) and keeping only data-encoding elements.

Q3. What are pre-attentive attributes? Visual properties processed by the brain in <250ms before conscious attention: position, length, color hue, color value, size, shape, orientation. Position is most accurate for quantitative comparison.

Q4. When would you use a bar chart vs. a line chart? Bar charts: categorical comparison (products, regions). Line charts: continuous time-based trends. Rule: if the X-axis has meaningful gaps between points, use bars; if continuous, use lines.

Q5. What is the Gestalt closure principle in visualization? The brain "closes" incomplete shapes mentally. Area charts and closed polygons (radar charts) feel complete even when truncated.

Q6. What is a choropleth map? A map where geographic regions are colored based on a statistical variable — GDP, population, sales. Color intensity encodes magnitude (sequential palette) or direction (diverging palette).

Q7. What is the 80/20 rule in data visualization? The Pareto principle: ~80% of value comes from 20% of factors. Visualized with a Pareto chart (sorted bar + cumulative % line). Used in product analysis, defect analysis, sales attribution.

Q8. What is overplotting and how do you fix it? Overplotting occurs when too many scatter points overlap, hiding density. Fixes: transparency (alpha), jitter, hexbin plot, 2D histogram, or sampling.

Q9. Sequential vs. diverging vs. qualitative color palettes? Sequential (Blues): ordered magnitude. Diverging (RdBu): positive/negative with meaningful midpoint. Qualitative (tab10): categorical — no implied order.

Q10. What is a Sankey diagram used for? Visualizing flows and quantities between stages — customer journeys, energy flows, financial transfers. Width of flows encodes volume.

Q11. What is a violin plot? Combines box plot (quartiles + outliers) with KDE (full distribution shape). Shows where data is dense, unlike box plot which only shows 5-number summary.

Q12. What is small multiples (faceting)? The same chart repeated for different data subsets (e.g., same line chart for each product). Enables easy comparison without overlapping data. Implemented with FacetGrid in Seaborn.

Q13. How do you design for color-blind viewers? Use colorblind-safe palettes (e.g., ColorBrewer), don't rely on color alone — add shape, pattern, or direct labels. ~8% of men have red-green color blindness.

Q14. What is a treemap? Hierarchical visualization where rectangles encode value through size. Good for part-of-whole with many items (e.g., file system, product categories). Max 3 hierarchy levels.

Q15. Why is 3D visualization generally discouraged? 3D perspective distorts proportions — nearer bars appear larger. Viewers must mentally compensate for the 3D angle. Only use 3D for inherently 3D data (surfaces, terrain).

---

Section B: Coding Challenges (Q16-35)

python
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Challenge 1: Dual comparison bar chart with growth annotation
def comparison_chart(df, category_col, val1_col, val2_col, labels=(&#039;2023', '2024')):
    x = np.arange(len(df))
    width = 0.35
    fig, ax = plt.subplots(figsize=(10, 5))
    ax.bar(x - width/2, df[val1_col], width, label=labels[0], color=&#039;#90CAF9')
    ax.bar(x + width/2, df[val2_col], width, label=labels[1], color=&#039;#1565C0')
    ax.set_xticks(x)
    ax.set_xticklabels(df[category_col])
    ax.legend()
    for i, (v1, v2) in enumerate(zip(df[val1_col], df[val2_col])):
        growth = (v2 - v1) / v1 * 100
        ax.text(i, max(v1, v2) + ax.get_ylim()[1]*0.02,
                f&#039;{growth:+.1f}%', ha='center', fontsize=9, color='green' if growth > 0 else 'red')
    ax.spines[&#039;top'].set_visible(False)
    ax.spines[&#039;right'].set_visible(False)
    return fig

# Challenge 2: Heatmap with significance markers
def heatmap_with_significance(corr_matrix, p_matrix, threshold=0.05):
    fig, ax = plt.subplots(figsize=(10, 8))
    sns.heatmap(corr_matrix, annot=True, fmt=&#039;.2f', cmap='RdBu_r',
                 vmin=-1, vmax=1, center=0, ax=ax)
    for i in range(len(corr_matrix)):
        for j in range(len(corr_matrix.columns)):
            if p_matrix.iloc[i, j] < threshold:
                ax.text(j + 0.8, i + 0.2, &#039;*', fontsize=14, color='gold', fontweight='bold')
    return fig

# Challenge 3: Rolling metric with threshold band
def rolling_with_threshold(series, window=30, upper=None, lower=None):
    fig, ax = plt.subplots(figsize=(12, 5))
    ax.plot(series.index, series, alpha=0.4, color=&#039;#90CAF9', linewidth=0.8, label='Daily')
    rolling = series.rolling(window).mean()
    ax.plot(series.index, rolling, color=&#039;#1565C0', linewidth=2.5, label=f'{window}d MA')
    if upper: ax.axhline(upper, color=&#039;red', linestyle='--', label='Upper threshold')
    if lower: ax.axhline(lower, color=&#039;orange', linestyle='--', label='Lower threshold')
    if upper and lower:
        ax.fill_between(series.index, lower, upper, alpha=0.05, color=&#039;green')
    ax.legend()
    ax.spines[&#039;top'].set_visible(False)
    ax.spines[&#039;right'].set_visible(False)
    return fig

# Challenge 4: Outlier-highlighted scatter
def highlight_outliers(x, y, labels=None, z_threshold=2.5):
    z_scores = np.abs((y - y.mean()) / y.std())
    outliers = z_scores > z_threshold
    fig, ax = plt.subplots(figsize=(9, 6))
    ax.scatter(x[~outliers], y[~outliers], alpha=0.5, color=&#039;#90CAF9', s=50)
    ax.scatter(x[outliers], y[outliers], color=&#039;#F44336', s=100, zorder=5, label='Outliers')
    if labels is not None:
        for xi, yi, lbl in zip(x[outliers], y[outliers], np.array(labels)[outliers]):
            ax.annotate(lbl, (xi, yi), xytext=(5,5), textcoords=&#039;offset points')
    ax.legend()
    return fig

print("✅ 4 reusable visualization functions implemented!")

5. Interview Chart Critique

text
1234567891011121314151617
COMMON INTERVIEW TASK: "Critique this chart and redesign it"

CRITIQUE FRAMEWORK (CASE method):
C — Context:  What data is shown? What question does it answer?
A — Accuracy: Are encodings accurate? Y-axis at zero? Correct chart type?
S — Story:    What&#039;s the key message? Is it clear in <5 seconds?
E — Execute:  How would you redesign it for maximum clarity?

Example critique response:
"This 3D pie chart has 12 slices. Issues:
1. 3D perspective distorts slice proportions
2. 12 slices are impossible to compare (angle perception)
3. No sorted order
4. Redundant legend

Redesign: Sorted horizontal bar chart, top 5 + &#039;Other' grouping,
value labels instead of legend, clean white background"

6. MCQs

Question 1

Pre-attentive processing occurs in?

Question 2

Most accurate visual encoding?

Question 3

Data-ink ratio is maximized by?

Question 4

Sankey diagram width encodes?

Question 5

FacetGrid creates?

Question 6

CASE critique framework stands for?

Question 7

Violin plot advantage over box plot?

Question 8

alpha=0.3 fixes?

Question 9

Pareto chart shows?

Question 10

Color-blind safe design requires?

7. Summary

50 interview questions covering: theory (pre-attentive, Tufte, Gestalt), chart selection, dashboard design, and coding challenges (reusable functions, CASE critique). Master these patterns to confidently discuss visualization in data analyst, data scientist, and BI engineer interviews.

8. Next Chapter Recommendation

In Chapter 29: Performance Optimization for Large Datasets, we render millions of data points efficiently using sampling, aggregation, and specialized rendering tools.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·