CHAPTER 10 Beginner

Data Analysis with Pandas in Jupyter

Updated: May 18, 2026

5 min read

# CHAPTER 10

Data Analysis with Pandas in Jupyter

1. Chapter Introduction

Python's built-in file handling is great for basic text, but terrible for massive spreadsheets. Enter Pandas. Pandas is a third-party Python library that provides the DataFrame—a powerful, programmable spreadsheet. Jupyter and Pandas are a match made in heaven because Jupyter formats Pandas DataFrames into beautiful, interactive HTML tables automatically.

2. Installing and Importing Pandas

If you installed Anaconda, Pandas is already included. If not, run !pip install pandas in a cell.

Cell 1:

python

import pandas as pd
# 'pd' is the universal industry standard alias for pandas

3. The Pandas DataFrame

A DataFrame is a 2-dimensional labeled data structure. Think of it exactly like an Excel spreadsheet or a SQL table.

Cell 2:

python

12345678910111213

# Creating a DataFrame from a dictionary
data = {
    &#039;Name': ['Alice', 'Bob', 'Charlie', 'David'],
    &#039;Age': [25, 30, 35, 40],
    &#039;Department': ['Sales', 'IT', 'HR', 'IT'],
    &#039;Salary': [55000, 75000, 60000, 80000]
}

df = pd.DataFrame(data)

# In Jupyter, simply typing 'df' at the end of the cell 
# renders a beautiful, styled HTML table!
df

4. Reading Data from a CSV

In the real world, you don't type data out by hand. You load it from a CSV (Comma Separated Values) file.

Cell 3:

python

123456

# Load a CSV file into a DataFrame
# (Assuming 'employees.csv' is in your folder)
# df = pd.read_csv('employees.csv')

# Look at the first 3 rows
df.head(3)

Jupyter Pro-Tip: If your dataset has 10,000 rows, do NOT just type df and hit Enter. Jupyter will try to render the massive table and might freeze your browser. Always use df.head() to preview the first 5 rows, or df.tail() to see the last 5.

5. Data Inspection

Once data is loaded, you must inspect it to understand its shape and data types.

Cell 4:

python

12345678

# Check how many rows and columns (Rows, Columns)
print("Shape:", df.shape)

# Get a summary of data types and non-null values
df.info()

# Get statistical summaries of numeric columns (mean, min, max, etc.)
df.describe()

6. Filtering and Querying Data

Pandas allows you to slice and dice your data easily.

Cell 5:

python

12345678

# Select a single column (returns a Pandas Series)
ages = df[&#039;Age']

# Filter rows based on a condition (e.g., IT department only)
it_staff = df[df[&#039;Department'] == 'IT']

# Display the filtered data
it_staff

7. Basic Data Cleaning

Data is rarely perfect. Pandas provides tools to handle missing values (NaN).

Cell 6:

python

123456789101112

import numpy as np

# Let's introduce a missing value for demonstration
df.loc[1, &#039;Salary'] = np.nan
print("Data with missing salary:")
display(df) # display() forces Jupyter to render the HTML table inside print-like logic

# Option A: Fill missing values with a default number
df_filled = df.fillna(0)

# Option B: Drop any row that contains a missing value
df_dropped = df.dropna()

8. Mini Project: Student Analytics Notebook

Create a new notebook and run this workflow.

Cell 1:

python

1234567

import pandas as pd
# Load student grades
grades = pd.DataFrame({
    &#039;Student': ['Tom', 'Jane', 'Bill', 'Sara'],
    &#039;Math': [88, 92, 79, 95],
    &#039;Science': [85, 90, 82, 98]
})

Cell 2:

python

123

# Calculate Average Grade per student
grades[&#039;Average'] = (grades['Math'] + grades['Science']) / 2
grades

Cell 3:

python

1234

# Find Top Performers (Average > 90)
top_students = grades[grades[&#039;Average'] > 90]
print("Students eligible for honors:")
display(top_students)

9. Common Mistakes

Printing DataFrames: Beginners often write print(df). This outputs ugly, raw text aligned with spaces. Instead, use display(df) or simply leave df on the last line of the cell. Jupyter will render it as a styled HTML table.

Forgetting the alias: Always import pandas as pd. If you just import pandas, you have to type pandas.DataFrame every time instead of pd.DataFrame, which gets tedious.

10. MCQs

Question 1

What is the standard industry alias for importing Pandas?

Question 2

What is a Pandas DataFrame?

Question 3

How do you load a CSV file into a DataFrame?

Question 4

To view only the first 5 rows of a large dataset to prevent Jupyter from freezing, you use:

Question 5

Which function provides a statistical summary (mean, min, max) of numeric columns?

Question 6

How do you extract a single column named 'Age' from the DataFrame?

Question 7

What does `df.shape` return?

Question 8

Why is it better to type `df` on the last line of a cell instead of `print(df)`?

Question 9

If you want to force Jupyter to render the beautiful HTML table in the middle of a code cell (not just the last line), you use?

Question 10

What does `df.dropna()` do?

11. Interview Questions

Q: Explain the difference between a Pandas Series and a Pandas DataFrame.

Q: You load a CSV into Jupyter and it has 5 million rows. What commands do you run first to understand the data without crashing the notebook?

12. Summary

Pandas is the engine of data science in Python. It provides the DataFrame for tabular data. By combining Pandas with Jupyter Notebooks, you get a highly visual, interactive spreadsheet environment. Use pd.read_csv() to load data, df.head() to preview it, and display(df) to ensure Jupyter renders it beautifully.

13. Next Chapter Recommendation

In Chapter 11: NumPy Integration in Jupyter, we will explore the math engine that powers Pandas under the hood, enabling lightning-fast calculations on massive datasets.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Data Analysis with Pandas in Jupyter #

1. Chapter Introduction #

2. Installing and Importing Pandas #

3. The Pandas DataFrame #

4. Reading Data from a CSV #

5. Data Inspection #

6. Filtering and Querying Data #

7. Basic Data Cleaning #

8. Mini Project: Student Analytics Notebook #

9. Common Mistakes #

10. MCQs #

What is the standard industry alias for importing Pandas?

What is a Pandas DataFrame?

How do you load a CSV file into a DataFrame?

To view only the first 5 rows of a large dataset to prevent Jupyter from freezing, you use:

Which function provides a statistical summary (mean, min, max) of numeric columns?

How do you extract a single column named 'Age' from the DataFrame?

What does df.shape return?

Why is it better to type df on the last line of a cell instead of print(df)?

If you want to force Jupyter to render the beautiful HTML table in the *middle* of a code cell (not just the last line), you use?

What does df.dropna() do?

11. Interview Questions #

12. Summary #

13. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

Send Feedback / Bug

Feedback Submitted!

Data Analysis with Pandas in Jupyter

1. Chapter Introduction

2. Installing and Importing Pandas

3. The Pandas DataFrame

4. Reading Data from a CSV

5. Data Inspection

6. Filtering and Querying Data

7. Basic Data Cleaning

8. Mini Project: Student Analytics Notebook

9. Common Mistakes

10. MCQs

What does `df.shape` return?

Why is it better to type `df` on the last line of a cell instead of `print(df)`?

If you want to force Jupyter to render the beautiful HTML table in the middle of a code cell (not just the last line), you use?

What does `df.dropna()` do?

11. Interview Questions

12. Summary

13. Next Chapter Recommendation