14. Graphs

📋 Jump to Takeaways

Graphs model relationships. Social networks (people connected by friendships), maps (cities connected by roads), dependencies (tasks that must happen before other tasks), the internet (pages connected by links). Any time you have "things" and "connections between things," you have a graph.

The Structure

A graph has vertices (nodes) and edges (connections). Edges can be:

Directed — one-way (A → B doesn't mean B → A)
Undirected — two-way (A — B means both can reach each other)
Weighted — edges have costs (road distances, network latency)
Unweighted — all edges are equal

Undirected:          Directed:           Weighted:
  A — B              A → B              A —3— B
  |   |              ↓   ↓              |     |
  C — D              C → D              2     5
                                        |     |
                                        C —1— D

Representation

Two standard ways to store a graph:

Adjacency list — each node stores its neighbors. Best for sparse graphs (few edges relative to nodes). Python's defaultdict(list) is the idiomatic choice.

from collections import defaultdict

# Adjacency list — defaultdict avoids KeyError on missing nodes
graph = defaultdict(list)
graph[0] = [1, 2]
graph[1] = [0, 3]
graph[2] = [0, 3]
graph[3] = [1, 2]

# Building from edge list
edges = [[0, 1], [0, 2], [1, 3], [2, 3]]
graph = defaultdict(list)
for u, v in edges:
    graph[u].append(v)
    graph[v].append(u)  # omit for directed graph

# Weighted adjacency list
weighted = defaultdict(list)
weighted[0] = [(1, 3), (2, 2)]  # (neighbor, weight)
weighted[1] = [(3, 5)]
weighted[2] = [(3, 1)]

Adjacency matrix — 2D array where matrix[i][j] = 1 means edge from i to j. Best for dense graphs or when you need O(1) edge lookup.

# Adjacency matrix — O(1) edge check, O(V²) space
matrix = [
    [0, 1, 1, 0],  # node 0 connects to 1, 2
    [1, 0, 0, 1],  # node 1 connects to 0, 3
    [1, 0, 0, 1],  # node 2 connects to 0, 3
    [0, 1, 1, 0],  # node 3 connects to 1, 2
]

	Adjacency List	Adjacency Matrix
Space	O(V + E)	O(V²)
Check edge exists	O(degree)	O(1)
Find all neighbors	O(degree)	O(V)
Best for	Sparse graphs	Dense graphs

Most interview problems use adjacency lists.

DFS — Depth-First Search

Go as deep as possible before backtracking. Uses a stack (or recursion).

# DFS — recursive
def dfs(graph: dict, node: int, visited: set, order: list):
    visited.add(node)
    order.append(node)
    for neighbor in graph[node]:
        if neighbor not in visited:
            dfs(graph, neighbor, visited, order)

# DFS — iterative with explicit stack
def dfs_iterative(graph: dict, start: int) -> list[int]:
    visited = set()
    stack = [start]
    order = []
    while stack:
        node = stack.pop()
        if node in visited:
            continue
        visited.add(node)
        order.append(node)
        for neighbor in graph[node]:
            if neighbor not in visited:
                stack.append(neighbor)
    return order

DFS explores one path completely before trying another. Good for: detecting cycles, topological sort, finding connected components, path existence.

BFS — Breadth-First Search

Explore all neighbors at distance 1, then distance 2, then distance 3. Uses a deque for O(1) popleft.

from collections import deque

# BFS — finds shortest path in unweighted graphs
def bfs(graph: dict, start: int) -> dict[int, int]:
    dist = {start: 0}
    queue = deque([start])
    while queue:
        node = queue.popleft()
        for neighbor in graph[node]:
            if neighbor not in dist:
                dist[neighbor] = dist[node] + 1
                queue.append(neighbor)
    return dist  # distance from start to every reachable node

BFS guarantees shortest path in unweighted graphs. The first time you reach a node is the shortest way to get there.

Important: always use collections.deque for BFS. A regular list with pop(0) is O(n) — it shifts every element. deque.popleft() is O(1).

Multi-Source BFS

Sometimes you start BFS from multiple sources simultaneously. Enqueue ALL sources at once and expand outward in parallel.

LeetCode 994 - Rotting Oranges

# Rotting oranges: how many minutes until all oranges rot?
def oranges_rotting(grid: list[list[int]]) -> int:
    rows, cols = len(grid), len(grid[0])
    queue = deque()
    fresh = 0
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == 2:
                queue.append((r, c))
            elif grid[r][c] == 1:
                fresh += 1

    minutes = 0
    dirs = [(0, 1), (0, -1), (1, 0), (-1, 0)]
    while queue and fresh > 0:
        for _ in range(len(queue)):
            r, c = queue.popleft()
            for dr, dc in dirs:
                nr, nc = r + dr, c + dc
                if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1:
                    grid[nr][nc] = 2
                    fresh -= 1
                    queue.append((nr, nc))
        minutes += 1

    return -1 if fresh > 0 else minutes

The key insight: by starting from all sources at once, each cell gets the minimum distance to its nearest source. One BFS pass, O(V+E) total.

Topological Sort

Topological sort orders nodes in a directed acyclic graph (DAG) so that for every edge A → B, A comes before B. Think: course prerequisites, build dependencies, task scheduling.

Kahn's Algorithm (BFS-based):

Count in-degrees (how many edges point to each node)
Enqueue all nodes with in-degree 0 (no dependencies)
Process queue: for each node, reduce in-degree of its neighbors. If a neighbor hits 0, enqueue it.

LeetCode 210 - Course Schedule II

from collections import deque, defaultdict

def topological_sort(n: int, edges: list[list[int]]) -> list[int]:
    graph = defaultdict(list)
    in_degree = [0] * n
    for u, v in edges:
        graph[u].append(v)
        in_degree[v] += 1

    queue = deque(i for i in range(n) if in_degree[i] == 0)
    order = []

    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    if len(order) != n:
        return []  # cycle detected — not a DAG
    return order

If the result has fewer nodes than the graph, there's a cycle — topological sort is impossible. This is also how you detect cycles in directed graphs.

Connected Components

How many separate groups exist? Run DFS/BFS from each unvisited node, count how many times you start.

LeetCode 323 - Number of Connected Components in an Undirected Graph

def count_components(n: int, graph: dict) -> int:
    visited = set()
    count = 0

    def explore(node: int):
        visited.add(node)
        for neighbor in graph[node]:
            if neighbor not in visited:
                explore(neighbor)

    for i in range(n):
        if i not in visited:
            explore(i)
            count += 1
    return count

When DFS vs BFS

Use DFS when	Use BFS when
Detecting cycles	Finding shortest path (unweighted)
Topological sort	Level-order processing
Path existence (any path)	Nearest neighbor queries
Exploring all possibilities	Minimum steps/moves
Space is tight (recursion stack)	You need the closest result first

Cycle detection: In a directed graph, if DFS visits a node that's currently on the recursion stack (not just visited before), there's a cycle. This is how topological sort detects invalid DAGs — if len(order) != n, a cycle exists.

Rule of thumb: if the problem says "minimum," "shortest," or "fewest," think BFS. If it says "all paths," "detect cycle," or "can you reach," think DFS.

Grid as Graph

Many problems give you a 2D grid. Each cell is a node, adjacent cells (up/down/left/right) are edges.

LeetCode 200 - Number of Islands

def num_islands(grid: list[list[str]]) -> int:
    rows, cols = len(grid), len(grid[0])
    count = 0

    def sink(r: int, c: int):
        if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] == "0":
            return
        grid[r][c] = "0"  # mark visited
        sink(r + 1, c)
        sink(r - 1, c)
        sink(r, c + 1)
        sink(r, c - 1)

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == "1":
                sink(r, c)
                count += 1
    return count

Complexity

Algorithm	Time	Space
DFS	O(V + E)	O(V)
BFS	O(V + E)	O(V)
Topological sort	O(V + E)	O(V)

V = vertices, E = edges. Both DFS and BFS visit every node and every edge exactly once.

Practice Problems

Problem	Difficulty	Link
Flood Fill	Easy	LeetCode 733
Number of Islands	Medium	LeetCode 200
All Paths From Source to Target	Medium	LeetCode 797
Clone Graph	Medium	LeetCode 133
Course Schedule	Medium	LeetCode 207
Course Schedule II	Medium	LeetCode 210
Pacific Atlantic Water Flow	Medium	LeetCode 417
Word Ladder	Hard	LeetCode 127
Rotting Oranges	Medium	LeetCode 994

Key Takeaways

Graphs model relationships — vertices are things, edges are connections
defaultdict(list) is the idiomatic Python adjacency list
DFS goes deep first (stack/recursion) — cycles, topological sort, path existence
BFS goes wide first (deque) — shortest path in unweighted graphs, minimum steps
Always use deque for BFS — list.pop(0) is O(n), deque.popleft() is O(1)
"Minimum/shortest/fewest" → BFS. "All paths/cycle/reachability" → DFS
Grids are graphs — each cell connects to its 4 (or 8) neighbors
Kahn's algorithm with deque is the cleanest topological sort for interviews
Both DFS and BFS are O(V + E) time