Skip to main content
Binary Trees & Graphs
CHAPTER 27 Beginner

Disjoint Set Union (Union-Find)

Updated: May 17, 2026
11 min read

# CHAPTER 27

Disjoint Set Union (Union-Find)

1. Introduction

In Chapter 25, we studied Kruskal's Minimum Spanning Tree algorithm. It sorts all edges globally and snaps them together one by one. But we mentioned a critical problem: *How does the algorithm instantly know if connecting Node A and Node B will accidentally draw a fatal cyclic loop?* If you run a full Graph DFS (Chapter 22) to check for a cycle every single time you add an edge, the Time Complexity degrades to an apocalyptic $O(E \times (V+E))$. The server will crash. To solve this, computer scientists invented a breathtakingly fast data structure: Disjoint Set Union (also known as Union-Find). It maps groupings of nodes so efficiently that detecting a cycle drops from $O(V+E)$ down to almost $O(1)$ constant time.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define the two primary functions: Find() and Union().
  • Implement a Parent tracking Array to represent Disjoint Sets.
  • Execute Path Compression to optimize the Find query.
  • Execute Union by Rank to optimize the Union merge.

3. The Core Concept (Sets and Representatives)

Imagine an isolated graph with 5 vertices (0, 1, 2, 3, 4) and absolutely no edges. Initially, every node is in its own isolated "Set". Node 0 is the boss of Set 0. Node 1 is the boss of Set 1. We create a Parent array to track this: [0, 1, 2, 3, 4].

Operation 1: Union(A, B) If we draw a physical Edge connecting Node 0 and Node 1, we must Union their Sets. We update the array so Node 1's parent is now Node 0. Parent Array: [0, 0, 2, 3, 4]. (Node 0 and Node 1 are now a connected family. Node 0 is the "Representative" boss).

Operation 2: Find(X) If we want to know what family a node belongs to, we call Find(). It climbs the Parent array until it finds the ultimate boss (a node whose parent is itself). Find(1) returns 0. Find(0) returns 0.

Cycle Detection Logic: If we pull a new Edge connecting Node A and Node B. We call Find(A) and Find(B).

  • If the answers are DIFFERENT, they belong to different families. Safe to connect! Call Union(A, B)!
  • If the answers are exactly the SAME, they already belong to the same family tree! Connecting them draws a line between two nodes that are already connected elsewhere... CYCLE DETECTED!

4. The Two Ultimate Optimizations

If you just connect nodes blindly (e.g., 1->2, 2->3, 3->4), the Parent array degrades into a massive, straight-line Linked List. Calling Find(4) forces the algorithm to take 4 steps to find the Boss ($O(N)$ time!). We must optimize.

Optimization 1: Union by Rank (Tree Balancing) We maintain a Rank (or Size) array. When calling Union(Set A, Set B), we always attach the *smaller* tree underneath the root of the *larger* tree. This mathematically guarantees the resulting tree remains shallow and balanced, keeping $O(\log N)$ search depth.

Optimization 2: Path Compression (The $O(1)$ Miracle) This is the true genius of Union-Find. When you call Find(4) and it traces 4 -> 3 -> 2 -> 1 -> Boss(0). Instead of just returning 0, the Find algorithm actively rewrites the array as it backtracks! It forcefully unplugs 4, 3, and 2, and plugs them ALL directly into Boss 0. The next time you call Find(4), it doesn't take 4 steps. It takes 1 step. The tree flattens itself dynamically as you search it!

5. Python Implementation (Production Code)

This code represents the ultra-optimized Union-Find architecture.
python
123456789101112131415161718192021222324252627282930313233343536
class UnionFind:
    def __init__(self, size):
        self.parent = [i for i in range(size)]  # Every node is its own boss initially
        self.rank   = [1] * size                 # Track tree depth for balancing

    # 1. The Find Function (With Path Compression)
    def find(self, i):
        if self.parent[i] == i:
            return i  # Found the Boss!
        # Recursive Path Compression — dynamically flatten the tree!
        self.parent[i] = self.find(self.parent[i])
        return self.parent[i]

    # 2. The Union Function (With Union by Rank)
    def union(self, i, j):
        root_i = self.find(i)
        root_j = self.find(j)

        if root_i != root_j:  # Only merge if they belong to different sets!
            # Attach smaller-rank tree under the larger-rank tree
            if self.rank[root_i] > self.rank[root_j]:
                self.parent[root_j] = root_i
            elif self.rank[root_i] < self.rank[root_j]:
                self.parent[root_i] = root_j
            else:
                self.parent[root_j] = root_i  # Tie-breaker
                self.rank[root_i] += 1        # Increase height
            return True   # Successfully Unioned!
        else:
            return False  # CYCLE DETECTED! Both have the same Root!

# Test it out!
uf = UnionFind(5)
print(uf.union(0, 1))  # True  (Safe)
print(uf.union(1, 2))  # True  (Safe)
print(uf.union(0, 2))  # False — Cycle Detected! 0 and 2 are already linked!

6. The Inverse Ackermann Function

Thanks to Path Compression and Union by Rank, the Time Complexity of Find and Union is mathematically defined as $O(\alpha(N))$, where $\alpha$ is the Inverse Ackermann Function. This function grows so impossibly slowly that for any number of vertices in the known universe (e.g., $N = 10^{80}$ atoms), the answer is $\le 4$. For all practical human intents and purposes, Union-Find executes in exactly $O(1)$ Amortized Constant Time.

7. Applications

  • Kruskal's MST Algorithm: Powering the rapid cycle-detection phase.
  • Image Processing: Detecting isolated "blobs" or connected islands of pixels.
  • Network Connectivity: Quickly checking if two computers in a massive server farm can talk to each other.

8. Common Mistakes

  • Forgetting Path Compression: When writing the recursive Find function, beginners often write return self.find(self.parent[i]). This finds the answer, but forgets to *save* it! You MUST write self.parent[i] = self.find(...) to flatten the array!

9. Exercises

  1. 1. Trace the Parent array state step-by-step for a 5-node setup doing the following: Union(0,1), Union(2,3), Union(1,3). Assume Union by Rank is used.
  1. 2. What would happen to the Time Complexity of Find if you implemented Union-Find using only the basic Parent array without Path Compression or Union by Rank?

10. MCQs with Answers

Question 1

What defines the foundational architectural goal of deploying a Disjoint Set Union (Union-Find) data structure inside a massive graph matrix?

Question 2

When an engineer executes the fundamental Find(X) function, what specific tracking variable is the algorithmic engine forcefully attempting to isolate and extract?

Question 3

If Kruskal's Algorithm extracts a new topological edge bridging Node X and Node Y, and subsequent Find(X) and Find(Y) calls violently evaluate to the exact identical Representative integer, what conclusion is triggered?

Question 4

To bypass severe depth degradation mimicking a sluggish Linked List, what physical restructuring geometry does "Union by Rank" aggressively enforce during the merging phase?

Question 5

Which breathtakingly elegant algorithmic optimization actively rewrites and fundamentally flattens the underlying geometric Parent array during the recursive backtracking phase of a Search query?

Question 6

What specific programmatic assignment operator MUST be explicitly coded into the recursive find sequence to activate Path Compression?

Question 7

What defines the nearly incomprehensible algorithmic Asymptotic Time Complexity rating representing a fully optimized Union-Find engine executing query metrics?

Question 8

When initializing a completely blank Disjoint Set framework housing exactly $V$ isolated vertices, what integer value natively populates the Rank array for every discrete index slot?

Question 9

If an un-optimized Disjoint Set engine (lacking Rank and Compression) processes heavily skewed, linear relational graphs, what catastrophic operational Time penalty degrades the Find extraction?

Q10. True or False: While incredibly powerful for undirected MST algorithms, Union-Find is completely mathematically useless for tracking "Strongly Connected Components" inside explicitly Directed Graphs (Digraphs) where vectors possess one-way arrows. a) True. Union-Find intrinsically assumes symmetric two-way edge traversal. Directed arrows violate the commutative parent-grouping logic required to validate overlapping clusters. b) False. Union-Find easily processes directional data flows via boolean masking. Answer: a) True. Union-Find intrinsically assumes symmetric two-way edge traversal. Directed arrows violate the commutative parent-grouping logic.

11. Interview Preparation

Top Interview Questions:
  • *Data Modeling:* "A server farm has 500 computers. We run a script adding cables between pairs of computers one at a time. After 300 cables are added, we want to know if Computer 12 and Computer 85 can communicate. What algorithm provides the fastest instant answer?" *(Answer: Union-Find! Set up 500 Disjoint Sets. Every time a cable is added, run Union(A, B). To check communication, simply run Find(12) == Find(85). With Path Compression, it executes in $O(1)$ time!)*

12. Summary

Disjoint Set Union (Union-Find) is a masterclass in algorithmic optimization. By shifting our focus from tracking explicit edges to tracking "Families" and Root Bosses, we simplify network analysis drastically. Through the sheer brilliance of Path Compression and Rank Balancing, operations that should logically take linear time are crushed down into the microscopic $O(1)$ Amortized boundaries, powering the world's most aggressive cycle detection engines.

13. Next Chapter Recommendation

We have mapped the standard playbook of Graph Theory: Traversals, Routing, Trees, and Cycle tracking. But the realm of networking harbors far deeper mathematical physics. How do you find the weakest link in a computer network? How do you map AI game logic? In Chapter 28: Advanced Graph Algorithms, we survey the elite tier of graph structures including Articulation Points, Bipartite Graphs, and A* Heuristic Search.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·