Hash Maps and Sets
# CHAPTER 4
Hash Maps and Sets
1. Chapter Introduction
If you are completely stuck on an interview problem and don't know what to do, your first instinct should be: "Can I use a Hash Map?" Hash Maps (Dictionaries in Python, HashMaps in Java, Objects/Maps in JS) and Hash Sets are the most powerful tools in a developer's arsenal. They perform the magic trick of trading a little bit of memory (Space) to achieve lightning-fast O(1) Time lookups. This chapter explains how Hash Tables work under the hood and how to apply them to classic interview patterns.2. What is a Hash Table?
An array requires you to know the exact numeric index to find data. A Hash Map allows you to look up data using a custom "Key" (like a string or a custom object). It stores data inKey-Value pairs.
*How it works (The Magic):*
-
1.
You want to store
{"John": 95}.
-
2.
The key
"John"is passed through a Hash Function.
-
3.
The Hash Function converts the string
"John"into an integer (e.g.,42).
-
4.
The system stores the value
95at index42in a hidden underlying array.
-
5.
Later, when you ask for
"John", it runs the hash function, instantly gets42, and returns95in O(1) Time.
3. Hash Sets vs. Hash Maps
-
Hash Map: Stores
Key -> Valuepairs. Used for mapping relationships or counting frequencies (e.g.,{"apple": 5, "banana": 2}).
-
Hash Set: Stores ONLY unique Keys (no values). Used purely to check for existence or remove duplicates. (e.g.,
{"apple", "banana"}). Checkingif "apple" in my_set:takes O(1) Time.
4. Hash Collisions (The Interview Question)
*Interviewer:* "What happens if the hash function generates the same index42 for both 'John' and 'Mike'?"
*Answer:* This is a Hash Collision. The most common way to resolve it is Chaining. Instead of storing just the value 95 at index 42, the array stores a Linked List. Both "John: 95" and "Mike: 88" are added to the Linked List at index 42. When retrieving, the map hashes to 42, then traverses the short linked list to find the exact key.
5. Pattern 1: The "Two Sum" Problem
This is the most famous interview question in the world (LeetCode #1). *Prompt:* Given an array of integersnums and an integer target, return the indices of the two numbers that add up to target.
*Brute Force:* Nested loops testing every pair. Time: O(N^2).
*Optimized (The Hash Map):*
We iterate through the array. For every number, we calculate the difference = target - num. Have we seen this difference before? We use a Hash Map to keep track of numbers we've seen and their indices.
*Why this is beautiful:* We reduced a quadratic O(N^2) problem down to a linear O(N) problem simply by trading O(N) Space for a Hash Map.
6. Pattern 2: Frequency Maps
Whenever a problem asks "Find the most frequent...", "Find the first non-repeating...", or "Are these two strings anagrams?", a Frequency Map is the solution.*Example: First Unique Character in a String*
7. Pattern 3: Finding Duplicates / Intersection
If you need to find elements that exist in *both* Array A and Array B (Intersection), or quickly remove duplicates from an array, use a Hash Set.8. Mini Project: Group Anagrams
*Prompt:* Given an array of strings["eat","tea","tan","ate","nat","bat"], group the anagrams together.
*The Trick:* What do "eat", "tea", and "ate" have in common? If you sort them alphabetically, they all become "aet".
*The Algorithm:*
-
1.
Create a Hash Map:
Map<String, List<String>>.
- 2. For each string, sort its characters to create a "key" ("aet").
- 3. Append the original string to the list located at that key in the map.
- 4. Return the values of the map.
9. Common Mistakes
-
Using Unhashable Types as Keys: In Python, you cannot use a
listor adictionaryas a key in a dictionary because they are mutable (can change). You must use an immutable type like a String, Integer, or Tuple.
-
Forgetting Space Complexity: When you declare
map = {}, you are using extra memory. You must state: "This takes O(N) extra space."
10. Best Practices
-
getOrDefault/.get(key, default): Always use these built-in methods when building frequency maps to elegantly handle keys that do not exist in the map yet, avoidingNullPointerExceptionorKeyError.
11. Exercises
-
1.
Trace the
twoSumalgorithm on paper fornums = [3, 2, 4],target = 6.
- 2. Explain how converting an Array to a Set affects Time and Space complexity.
12. MCQs
What is the average Time Complexity for inserting and retrieving an item from a Hash Map?
What is a "Hash Collision"?
How is a Hash Collision typically resolved internally by data structures like Java's HashMap?
What is the fundamental difference between a Hash Map and a Hash Set?
In the classic "Two Sum" optimal O(N) solution, what are we trading to achieve linear time?
When building a "Frequency Map" to count occurrences of characters in a string, what is the best practice for handling a character seen for the very first time?
If you use a Hash Map to count the frequencies of characters in an English word (only lowercase letters 'a' to 'z'), what is the Space Complexity?
Why can't you use an array (e.g., [1, 2, 3]) as a Key in a Python Dictionary?
If an interviewer asks you to find the "Intersection" of two arrays (elements that exist in both), what is the optimal data structure to use?
In the "Group Anagrams" problem, what is the clever "Key" used in the Hash Map to group the words "eat", "tea", and "ate" together?
14. Interview Questions
-
Q: "Design a Data Structure that supports
insert,delete, andgetRandomElementall in O(1) average time complexity." (LeetCode 380. Hint: You need to combine a Hash Map AND an Array).
15. FAQs
- Q: In an interview, should I explain how the Hash Function works internally?