Disjoint Set Union (DSU)
Overview
A set is a collection of elements. If two sets have no common elements, then they are called disjoint sets. For example, and are disjoint sets while and are not because they have a common element .
Disjoint Set Union (or DSU or Union Find) is a data structure that allows us to combine any two sets into one. Let's say we have elements and we initialise an array with a size of . Here we have sets and each individual element in the set is the parent.
vector<int> root(10);
for(int i = 0; i < 10; i++) root[i] = i;
If we join the first element and together, we first check if they belong to the same parent. If they do, it means they have already in the same set. Otherwises, we can point one to another and update like which means the root of element is . We can make it more flexible to check if they are already in the same set or not simply by returning a boolean value.
bool unite(int x, int y) {
x = get(x);
y = get(y);
if (x != y) {
if (x < y) root[y] = x;
else root[x] = y;
return true;
}
return false;
}
If we need to check whether two elements have the same parent, then we need a function to check it. To implement that, we simply check if the target element is , otherwise we can call the same function recursively until we have the root. In other word, the parent would be
int get(int x) {
return x == root[x] ? x : get(root[x]);
}
However, the above implementation is not efficient as each call depends on while we need to optimize it nearly constant time.
One way to optimize it is compress the path. For example, if the root element is and we have the chain like -> -> -> . If we write it vertically, element is on the top level, element is on the second level, element is on the third level and so on. We can compress these into the same level, i.e. element , and would be on the second level only so that we don't need to talk all the nodes between the root and the source. This would achieve per call on average.
int get(int x) {
return (x == root[x] ? x : (root[x] = get(root[x])));
}
We can futher optimize using union by rank. In the previous implementation, we always join the second one to the first one. However, we can choose the best side to make it faster. We can base on the depth of the trees to determine which side we would like to attach.
bool unite(int x, int y) {
x = get(x);
y = get(y);
if (x != y) {
if (rank[x] > rank[y]) {
root[y] = x;
} else if (rank[x] < rank[y]) {
root[x] = y;
} else {
root[y] = x;
rank[x] += 1;
}
cnt--;
return true;
}
return false;
}
Here's the final templatized version.
class dsu {
public:
vector<int> root, rank;
int n;
int cnt;
dsu(int _n) : n(_n) {
root.resize(n);
rank.resize(n);
for(int i = 0; i < n; i++) {
root[i] = i;
rank[i] = 1;
}
cnt = n;
}
inline int getCount() { return cnt; }
inline int get(int x) { return (x == root[x] ? x : (root[x] = get(root[x]))); }
inline bool unite(int x, int y) {
x = get(x);
y = get(y);
if (x != y) {
if (rank[x] > rank[y]) {
root[y] = x;
} else if (rank[x] < rank[y]) {
root[x] = y;
} else {
root[y] = x;
rank[x] += 1;
}
cnt--;
return true;
}
return false;
}
};
Here's some basic usages.
int main() {
int n = 10;
// init
dsu d = dsu(n);
// unite
d.unite(1, 2);
d.unite(3, 4);
d.unite(1, 4);
// get the parent
int p = d.get(1);
return 0;
}
Suggested Problems
Problem Name | Difficulty | Solution Link |
---|---|---|
1061. Lexicographically Smallest Equivalent String | Medium | View Solutions |
2421. Number of Good Paths | Hard | View Solutions |
2382. Maximum Segment Sum After Removals | Hard | N/A |