CHAPTER 28 Beginner

Data Analysis with Python Basics

Updated: May 17, 2026

30 min read

# Data Analysis with Python Basics

Welcome to Chapter 28! Python is the #1 language for data analysis. In this chapter, you'll learn the two most important libraries: NumPy (numerical computing) and Pandas (data manipulation).

---

1. Learning Objectives

Use NumPy for numerical operations.

Use Pandas for data manipulation.

Work with DataFrames and Series.

Read and analyze CSV files.

Create basic data visualizations.

---

2. NumPy Basics

bash

pip install numpy

```python id="py28ex1" import numpy as np

# Creating arrays arr = np.array([1, 2, 3, 4, 5]) print(f"Array: {arr}") print(f"Shape: {arr.shape}") print(f"Type: {arr.dtype}")

# Array operations (vectorized — no loops needed!) print(f"Sum: {arr.sum()}") print(f"Mean: {arr.mean()}") print(f"Std: {arr.std():.2f}") print(f"Max: {arr.max()}")

# Element-wise operations print(f"Doubled: {arr * 2}") print(f"Squared: {arr ** 2}")

# Creating special arrays zeros = np.zeros(5) ones = np.ones((3, 3)) rangearr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8] linspace = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1.0] randomarr = np.random.rand(5) # 5 random floats

print(f"Range: {rangearr}") print(f"Linspace: {linspace}")

### 2D Arrays (Matrices)

python id="py28_ex2" matrix = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])

print(f"Shape: {matrix.shape}") # (3, 3) print(f"Element [1,2]: {matrix[1, 2]}") # 6 print(f"Row 0: {matrix[0]}") # [1, 2, 3] print(f"Col 1: {matrix[:, 1]}") # [2, 5, 8] print(f"Sum: {matrix.sum()}") # 45 print(f"Row sums: {matrix.sum(axis=1)}") # [6, 15, 24]

1234

---

## 3. Pandas Basics

bash pip install pandas

python id="py28_ex3" import pandas as pd

# Series (1D labeled array) grades = pd.Series([85, 92, 78, 95, 88], index=["Alice", "Bob", "Charlie", "Diana", "Eve"]) print(grades) print(f"\nMean: {grades.mean():.1f}") print(f"Max: {grades.max()} ({grades.idxmax()})")

### DataFrames (2D labeled table)

python id="py28_ex4" import pandas as pd

# Creating DataFrame from dict data = { "Name": ["Alice", "Bob", "Charlie", "Diana", "Eve"], "Age": [25, 30, 28, 22, 27], "City": ["NYC", "LA", "Chicago", "NYC", "LA"], "Salary": [75000, 85000, 70000, 65000, 90000], "Department": ["Engineering", "Marketing", "Engineering", "HR", "Marketing"] }

df = pd.DataFrame(data) print(df) print(f"\nShape: {df.shape}") print(f"\nInfo:") print(df.info()) print(f"\nStatistics:") print(df.describe())

1234

---

## 4. DataFrame Operations

python id="py28ex5" import pandas as pd

df = pd.DataFrame({ "Name": ["Alice", "Bob", "Charlie", "Diana", "Eve"], "Age": [25, 30, 28, 22, 27], "Salary": [75000, 85000, 70000, 65000, 90000], "Dept": ["Eng", "Mkt", "Eng", "HR", "Mkt"] })

# Selecting columns print(df["Name"]) # Single column (Series) print(df[["Name", "Age"]]) # Multiple columns (DataFrame)

# Filtering rows seniors = df[df["Age"] > 25] highsalary = df[df["Salary"] >= 80000] engteam = df[df["Dept"] == "Eng"]

print("\nSeniors (age > 25):") print(seniors)

# Sorting sorteddf = df.sortvalues("Salary", ascending=False) print("\nSorted by Salary:") print(sorteddf)

# Adding columns df["Bonus"] = df["Salary"] * 0.1 df["Tax"] = df["Salary"] * 0.2

# Group by deptavg = df.groupby("Dept")["Salary"].mean() print("\nAverage Salary by Department:") print(deptavg)

1234

---

## 5. Reading CSV Files

python id="py28ex6" import pandas as pd

# Create sample CSV sampledata = pd.DataFrame({ "Date": ["2025-01-01", "2025-01-02", "2025-01-03", "2025-01-04"], "Product": ["Laptop", "Phone", "Tablet", "Laptop"], "Quantity": [5, 12, 8, 3], "Price": [999.99, 699.99, 499.99, 1099.99] }) sampledata.tocsv("sales.csv", index=False)

# Read CSV df = pd.read_csv("sales.csv") print(df.head())

# Quick analysis print(f"\nTotal Revenue: ${(df['Quantity'] * df['Price']).sum():,.2f}") print(f"Average Price: ${df['Price'].mean():,.2f}") print(f"Most Sold: {df.loc[df['Quantity'].idxmax(), 'Product']}")

1234

---

## 6. Data Cleaning

python id="py28ex7" import pandas as pd import numpy as np

# Sample data with issues df = pd.DataFrame({ "Name": ["Alice", "Bob", None, "Diana", "Eve"], "Age": [25, np.nan, 28, 22, 27], "Score": [85, 92, 78, np.nan, 88] })

print("Before cleaning:") print(df) print(f"\nMissing values:\n{df.isnull().sum()}")

# Fill missing values df["Age"].fillna(df["Age"].mean(), inplace=True) df["Name"].fillna("Unknown", inplace=True)

# Drop rows with any NaN dfclean = df.dropna()

print("\nAfter cleaning:") print(df_clean)

1234

---

## 7. Basic Visualization

python id="py28ex8" # Note: Install matplotlib: pip install matplotlib import matplotlib matplotlib.use('Agg') # Non-interactive backend import matplotlib.pyplot as plt import pandas as pd

# Sales data months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"] sales = [12000, 15000, 13500, 17000, 19000, 22000]

# Line chart plt.figure(figsize=(10, 5)) plt.plot(months, sales, marker='o', color='#3498db', linewidth=2) plt.title("Monthly Sales 2025", fontsize=16) plt.xlabel("Month") plt.ylabel("Sales ($)") plt.grid(True, alpha=0.3) plt.savefig("saleschart.png", dpi=100, bboxinches='tight') plt.close() print("📊 Chart saved as saleschart.png")

# Bar chart departments = ["Engineering", "Marketing", "HR", "Sales"] headcount = [45, 30, 15, 25]

plt.figure(figsize=(8, 5)) plt.bar(departments, headcount, color=['#3498db', '#e74c3c', '#2ecc71', '#f39c12']) plt.title("Department Headcount") plt.ylabel("Employees") plt.savefig("departments.png", dpi=100, bboxinches='tight') plt.close() print("📊 Chart saved as departments.png") ``

--- 8. MCQs with Answers # Q1: NumPy arrays are: A) Slower than lists B) Faster than lists C) Same speed D) Only for strings Answer: B — NumPy uses C-optimized operations.
Q2: df.head() shows:A) Last 5 rows B) First 5 rows C) All rows D) Column names Answer: B
Q3: df.describe() provides:A) Column names B) Statistical summary C) Data types D) Missing values Answer: B
Q4: df.groupby() does:A) Sorts data B) Groups and aggregates C) Filters data D) Merges data Answer: B

Q5: pd.readcsv() returns:A) List B) Dict C) DataFrame D) Array Answer: C

Q6: df.isnull().sum() counts:A) Rows B) Columns C) Missing values per column D) Total cells Answer: C

Q7: NumPy np.zeros((3,3)) creates:A) 3x3 of ones B) 3x3 of zeros C) 1D array D) Error Answer: B

Q8: df.sortvalues("col") sorts by:A) Index B) Column values C) Data type D) Memory Answer: B

Q9: fillna() does:A) Drops missing B) Fills missing values C) Finds missing D) Counts missing Answer: B
Q10: Pandas Series is: A) 2D B) 1D labeled array C) Dict D) Matrix Answer: B --- 9. Interview Questions # 1. NumPy vs Python lists? NumPy is faster (C-optimized), supports vectorized operations, uses less memory, fixed-type. 2. What is a DataFrame? 2D labeled data structure with rows and columns (like a spreadsheet or SQL table).
3. How to handle missing data?dropna(), fillna(), interpolate(). Choice depends on context.
4. Series vs DataFrame? Series is 1D; DataFrame is 2D. A DataFrame is a collection of Series.
5. How to merge DataFrames?pd.merge() (SQL-like joins), pd.concat() (stacking), df.join().
--- 10. Summary # NumPy provides fast numerical computing with arrays. Pandas provides DataFrames for data manipulation and analysis.

Key Pandas operations:head(), describe(), groupby(), sortvalues(), merge().

Handle missing data withdropna() and fillna()`.

Visualize data with Matplotlib.

---

11. Next Chapter Recommendation

In Chapter 29: Python Interview Preparation, you'll prepare for technical interviews with 50 questions and 20 coding exercises! 🚀

Featured

Browse All 21+ Subject Areas

Popular Topics

More Topics

Quick Links

Featured

Visual Algorithm Labs

Sorting Algorithms

Data Structures

Featured

Frontend Dev

Career Paths

Skill Tracks

Featured

The Future of Web Architecture in 2026

Categories

Community

Practice Quizzes

Data Analysis with Python Basics

1. Learning Objectives

2. NumPy Basics

8. MCQs with Answers

9. Interview Questions

10. Summary

11. Next Chapter Recommendation

Finish this Chapter

Discussion

Send Feedback / Bug

Feedback Submitted!

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

1. Learning Objectives #

2. NumPy Basics #

8. MCQs with Answers #

9. Interview Questions #

10. Summary #

11. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 6

❓ Related Quizzes 4

🎥 Related Videos 2

Send Feedback / Bug

Feedback Submitted!

1. Learning Objectives

2. NumPy Basics

8. MCQs with Answers

9. Interview Questions

10. Summary

11. Next Chapter Recommendation