Counting Connected Components
Number of Connected Components in an Undirected Graph
Given n
nodes labeled from 0
to n - 1
and a list of undirected edges (each edge is a pair of nodes),
find the number of connected components in the undirected graph.
Example 1:
0 – 1 – 2
3 – 4
Given
n = 5
andedges = [[0, 1], [1, 2], [3, 4]]
, return2
.
Example 2:
0 – 1 – 2 – 3 – 4
Given
n = 5
andedges = [[0, 1], [1, 2], [2, 3], [3, 4]]
, return1
.
Note:
You can assume that no duplicate edges will appear in edges. Since all edges are undirected, [0, 1]
is the same as [1, 0]
and thus will not appear together in edges.
Idea
- Construct the graph from
edges
- Initialize
components = 0
- Start at an unvisited node, traverse the connected component that contains this node and mark every node in this component as visited
- Increment
components
by 1 - Repeat 3 and 4 until every node has been visited.
- Return
components
Before Implementing …
How do we represent a graph?
We want to represent a graph in such a way that supports fast neighbor lookup (like a getNeighbors()
method). So we can either use a HashMap to map a node to its neighbors or write a GraphNode
class:
class GraphNode:
def __init__(self, id):
self._id = id
self._neighbors = []
def add_neighbor(self, neighbor):
self._neighbors.append(neighbor)
@property
def neighbors(self):
return self._neighbors
This is a good opportunity to discuss with your interviewer about the trade-offs between the two implementations.
HashTable:
- Pro: Easy to implement
- Con: Code is not self-explanatory
GraphNode[^1]:
- Pro: Good design, self-explanatory code
- Con: Extra overhead
[^1]: An important note about using a GraphNode
class: after initializing GraphNode(0), GraphNode(1), ..., GraphNode(n-1)
, when we see [2, 3]
, we want to add GraphNode(3)
to GraphNode(2).neighbors
and vice versa. GraphNode(2)
and GraphNode(3)
have been created – how do we find them? I.e, given some i
, how do we store the GraphNode
's so we can quickly look up the GraphNode(i)
that has been created?
Solution 1: Store GraphNode(0)
,GraphNode(1)
,…, GraphNode(n-1)
in an array :
python nodes = [GraphNode(i) for i in range(n)]
Solution 2: Use a dictionary to map i
to GraphNode(i)
:
python nodes = {i: GraphNode(i) for i in range(n)}
In this case when the node ids range from 0 to n-1
, solution 1 is slightly better since the dictionary solution uses more space. In other cases, however, when the node ids are strings or integers but all are sparsely distributed within some range, using an array to store all the nodes will be either impossible or a huge waste of space.
Seeing that you understand how to use a GraphNode
class to construct the graph, the interviewer will likely suggest you use a HashMap to represent the graph to save some time.
How do we traverse a connected component?
Both BFS and DFS will work.
How do we store the visited nodes?
In our algorithm, we need a way to retrieve an unvisited node. So instead of keeping track the visited nodes, we use a set to store the unvisited nodes and we can get retrive an unvisited node using unvisited.pop()
.
Implementation
Main code
def countComponents(n, edges):
children = {i: [] for i in range(n)}
for a, b in edges:
children[a].append(b)
children[b].append(a)
components = 0
unvisited = set(range(n))
while unvisited:
somenode = unvisited.pop()
dfs(somenode, children, unvisited) # bfs also works here
components += 1
return components
DFS
def dfs(root, children, unvisited):
stack = [root]
while stack:
current = stack.pop()
for x in children[current]:
if x in unvisited:
unvisited.discard(x)
stack.append(x)
Note: set.discard(x)
can be replaced by set.remove(x)
. The difference between discard(x)
and remove(x)
is that set.remove(x)
will raise KeyError
if x
is not present.
BFS
from collections import deque
def bfs(root, children, unvisited):
q = deque([root])
while q:
current = q.popleft()
for x in children[current]:
if x in unvisited:
unvisited.discard(x)
q.append(x)
A Union Find Solution
A Disjoint Set Union (DSU) data structure[^2] can also be used to find the number of connected components in a graph. We’ll cover DSU in detail in a future post.
[^2]: Also called Union Find because of its two main operations.