Union-Find

Posted on Thu 15 January 2026 in posts

Introduction

In this post we will have a brief discussion on the Union-Find data structure, also called Disjoint-Set Union (DSU). This data structure is a great way to solve problems involving grouping elements into disjoint groups or sets, that is, sets or groups that have no elements in common.

We will cover a brief description of a textbook example of this problem and then I will show how I solved an advent of code problem using this structure (which is how I learned more about it in the first place while trying to solve it efficiently).

Textbook Example

Union-Find is typically applied to problems which require tracking of non-intersecting components in a larger collection.

A simple example is finding all connected components in an undirected graph.

A connected component is a subgraph in which any two vertices are connected by a path.

We can find these components by iterating over edges and creating unions between vertices that share an edge.

Advent of Code Puzzle Example

I recently encountered this data structure when solving Advent of Code 2025, day 8. The goal was to manipulate points with 3D coordinates to obtain circuits of points. The actual problem setting gives a nice contextualization which I recommend reading if interested, see here. Below I present example puzzle input for that problem.

"""162,817,812
57,618,57
906,360,560
592,479,940
352,342,300
466,668,158
542,29,236
431,825,988
739,650,466
52,470,668
216,146,977
819,987,18
117,168,530
805,96,715
346,949,466
970,615,88
941,993,340
862,61,35
984,92,344
425,690,689
"""

Let's simplify the task and just find all groups of points (circuits) that are closest by straight-line distance.

Let's do this in pseudocode first. We first find the closest pair of points. These two points form a circuit and this belong to the same set. Then we find the new pair of points that are closest but are not already connected via a set. If it in the new pair a point belongs to a previously seen set, we perform union operation.

Here is the python implementation (assuming points is our array of \((x, y, z)\) coordinates):

  for i in range(n):
      for j in range(n):
          if i > j:
              distances.append([i, j, dist(points[i], points[j])])
  # note that we could have used a heap in here
  distances = sorted(distances, key=lambda x: x[-1])
  union = UnionFind(n)
  for d in distances:
      union.union(d[0], d[1]) # this process will take care of unions

The equivalent C++ code:

  auto points = load_points("data.txt");
  size_t n = points.size();
  UnionFind union_find(n);
  std::vector<Tuple> pairs_and_distances;
  for (size_t i = 0; i < n; i++) {
    for (size_t j = 0; j < n; j++) {
      if (i > j) {
        float dist = distance(points[i], points[j]);
        pairs_and_distances.push_back(std::tuple<float, int, int>(dist, i, j));
      }
    }
  }
  std::sort(pairs_and_distances.begin(), pairs_and_distances.end());
  int count = 0;
  for (auto &[dist, i, j] : pairs_and_distances) {
    union_find.UnionRank(i, j);
  }

Full Python code is here and C++ version is here.

What makes Union-Find special?

The answer is simple: it's tremendously fast during search. You'll notice that in the snippet above I used find_compress and UnionRank. This is because this way we are optimizing via path compression. Long chains of parent-child relationships can lead to long search times.

The rank idea is to assign a rank to each node which will serve as an upper bound on the tree height (see section 19.3 in CLRS textbook). The compression idea is to make each node that we find point to the root, flattening the tree.

In fact, the implementation with these heuristics is so efficient that the projected runtime is inverse Ackerman function \(\alpha(n)\) which is \(\alpha(n) \leq 4\) for all practical values of \(n\). It grows so slowly that it is comparable to a constant for all practical purposes.