04. Hash Maps and Sets

📋 Jump to Takeaways

Hash maps solve the biggest limitation of lists: finding a value by content instead of by position. Lists need O(n) to search. Dicts do it in O(1). This single property makes them the most-used data structure in real code and interviews alike.

How Hash Maps Work

A hash map stores key-value pairs. You give it a key, it runs a hash function to compute an index, and stores the value at that index in an internal array.

key "alice" → hash("alice") → 3 → store at index 3
key "bob"   → hash("bob")   → 7 → store at index 7

Lookup "alice" → hash("alice") → 3 → found at index 3, O(1)

The hash function converts any key into an integer. A good hash function distributes keys evenly across the array. A bad one puts everything in the same slot (collision), degrading to O(n).

Python's dict

Python's built-in dict is a hash map. Since Python 3.7, dicts maintain insertion order (unlike Go maps):

ages = {
    "alice": 30,
    "bob": 25,
}

ages["carol"] = 28       # insert — O(1)
age = ages["alice"]      # lookup — O(1)
del ages["bob"]          # delete — O(1)

# Check existence
if "dave" in ages:
    print(ages["dave"])

# Safe access with default
age = ages.get("dave", 0)  # returns 0 if key missing

Operation Costs

Operation	Average	Worst
`d[key]`	O(1)	O(n)
`d[key] = val`	O(1)	O(n)
`del d[key]`	O(1)	O(n)
`key in d`	O(1)	O(n)
Iterate all	O(n)	O(n)

Worst case happens when all keys hash to the same bucket. In practice with Python's built-in dict, this essentially never happens. Treat it as O(1).

defaultdict — No KeyError

defaultdict auto-creates missing keys with a factory function. Eliminates the "check-then-set" pattern:

from collections import defaultdict

# Without defaultdict — verbose
graph = {}
for u, v in edges:
    if u not in graph:
        graph[u] = []
    graph[u].append(v)

# With defaultdict — clean
graph = defaultdict(list)
for u, v in edges:
    graph[u].append(v)  # auto-creates empty list if key missing

Common factories:

defaultdict(int) → missing keys default to 0 (great for counting)
defaultdict(list) → missing keys default to [] (great for grouping)
defaultdict(set) → missing keys default to set()

# Frequency counting with defaultdict
freq = defaultdict(int)
for ch in "hello":
    freq[ch] += 1  # no KeyError, starts at 0
# {'h': 1, 'e': 1, 'l': 2, 'o': 1}

Counter — Purpose-Built Frequency Map

Counter is a dict subclass optimized for counting. It's the go-to for frequency problems:

from collections import Counter

# Count anything iterable
freq = Counter("aabbbcc")  # Counter({'b': 3, 'a': 2, 'c': 2})
freq = Counter([1, 2, 2, 3, 3, 3])  # Counter({3: 3, 2: 2, 1: 1})

# Useful methods
freq.most_common(2)  # [(3, 3), (2, 2)] — top 2 by count
freq.total()         # sum of all counts (Python 3.10+)

# Arithmetic
a = Counter("aaab")
b = Counter("ab")
print(a - b)    # Counter({'a': 2}) — subtract counts, drop zeros
print(a & b)    # Counter({'a': 1, 'b': 1}) — min of each
print(a | b)    # Counter({'a': 3, 'b': 1}) — max of each

Counter vs defaultdict(int):

Same O(1) operations
Counter adds most_common(), arithmetic, and one-shot construction
Use Counter for frequency problems, defaultdict(int) for general counting in loops

See also: Top K Frequent Elements — three approaches beyond most_common(): sort, min-heap, and O(n) bucket sort.

Sets

A set is a collection of unique values with O(1) membership testing:

seen = set()

seen.add("alice")       # add — O(1)
seen.discard("alice")   # remove (no error if missing) — O(1)
seen.remove("alice")    # remove (KeyError if missing) — O(1)

if "bob" in seen:       # membership test — O(1)
    print("found")

Set Operations

Python sets support mathematical set operations — all O(min(len(a), len(b))) or O(len(a) + len(b)):

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

a | b   # union:        {1, 2, 3, 4, 5, 6}
a & b   # intersection: {3, 4}
a - b   # difference:   {1, 2}
a ^ b   # symmetric difference: {1, 2, 5, 6}

a.issubset(b)     # False
a.issuperset(b)   # False
a.isdisjoint(b)   # False (they share elements)

Set Construction from Iterables

# Deduplicate a list (loses order)
unique = set([1, 2, 2, 3, 3])  # {1, 2, 3}

# Deduplicate preserving order
seen = set()
unique_ordered = []
for x in items:
    if x not in seen:
        seen.add(x)
        unique_ordered.append(x)

# Set comprehension
evens = {x for x in range(20) if x % 2 == 0}

frozenset — Hashable Sets

Regular sets are mutable, so they can't be dict keys or members of other sets. frozenset is immutable and hashable:

# Use frozenset as a dict key
cache = {}
key = frozenset([1, 2, 3])
cache[key] = "result"

# Set of sets
groups = set()
groups.add(frozenset([1, 2]))
groups.add(frozenset([3, 4]))

# Membership test
frozenset([1, 2]) in groups  # True

Use frozenset when you need to use a set as a key for memoization or deduplication of states (common in BFS/DFS problems).

What Can Be a Key?

Only hashable types can be dict keys or set members. Hashable means immutable with a consistent hash:

Hashable (can be key)	Not hashable (can't be key)
`int`, `float`, `str`	`list`
`tuple` (if contents are hashable)	`dict`
`frozenset`	`set`
`bool`, `None`	any mutable object

# Tuple as key — common for coordinate problems
grid = {}
grid[(0, 0)] = "start"
grid[(1, 2)] = "wall"

# Convert list to tuple for hashing
state = [1, 2, 3]
seen = set()
seen.add(tuple(state))

Pattern: Two Sum (Complement Lookup)

Store values you've seen so you can find complements in O(1):

LeetCode 1 - Two Sum

def two_sum(nums: list[int], target: int) -> list[int]:
    seen = {}  # value → index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

Without a dict, this is O(n²) — check every pair. With a dict, one pass.

Pattern: Frequency Counting

LeetCode 242 - Valid Anagram

from collections import Counter

def is_anagram(s: str, t: str) -> bool:
    return Counter(s) == Counter(t)

LeetCode 347 - Top K Frequent Elements

from collections import Counter

def top_k_frequent(nums: list[int], k: int) -> list[int]:
    return [x for x, _ in Counter(nums).most_common(k)]  # most_common returns [(elem, count)], we extract just elements

Pattern: Deduplication / Seen Set

LeetCode 128 - Longest Consecutive Sequence

def longest_consecutive(nums: list[int]) -> int:
    num_set = set(nums)
    best = 0
    for n in num_set:
        if n - 1 not in num_set:  # only start counting from sequence starts
            length = 1
            while n + length in num_set:
                length += 1
            best = max(best, length)
    return best

O(n) — each number is visited at most twice (once in the outer loop, once in a while chain).

Pattern: Grouping

Group elements by a computed key:

LeetCode 49 - Group Anagrams

from collections import defaultdict

def group_anagrams(strs: list[str]) -> list[list[str]]:
    groups = defaultdict(list)
    for s in strs:
        key = tuple(sorted(s))
        groups[key].append(s)
    return list(groups.values())

Using a frequency tuple as key (avoids O(k log k) sort per word):

def group_anagrams(strs: list[str]) -> list[list[str]]:
    groups = defaultdict(list)
    for s in strs:
        key = [0] * 26
        for ch in s:
            key[ord(ch) - ord('a')] += 1
        groups[tuple(key)].append(s)
    return list(groups.values())

Pattern: Prefix Sum + Hash Map

Combine prefix sums with a hash map to find subarrays with a given sum:

LeetCode 560 - Subarray Sum Equals K

def subarray_sum(nums: list[int], k: int) -> int:
    prefix_count = {0: 1}  # prefix_sum → count of times seen
    current_sum = 0
    result = 0
    for n in nums:
        current_sum += n
        result += prefix_count.get(current_sum - k, 0)
        prefix_count[current_sum] = prefix_count.get(current_sum, 0) + 1
    return result

When to Use Hash Maps

You need O(1) lookup by key
Counting frequencies
Detecting duplicates
Caching computed results (memoization)
Mapping relationships (graph adjacency lists)
Prefix sum problems (sum → count mapping)

When NOT to Use Hash Maps

You need sorted order → use SortedDict from sortedcontainers or bisect
Keys are small integers (0 to n) → use a list (faster, less memory)
You need range queries ("all keys between 5 and 10") → use a sorted structure
Memory is extremely tight → dicts have ~50-70 bytes overhead per entry

dict vs defaultdict vs Counter

Feature	`dict`	`defaultdict`	`Counter`
Missing key	`KeyError`	Auto-creates	Returns 0
Best for	General mapping	Grouping/building	Frequency counting
`most_common()`	No	No	Yes
Arithmetic (`+`, `-`)	No	No	Yes
Construction from iterable	`dict.fromkeys()`	Manual	`Counter(iterable)`

Practice Problems

Problem	Difficulty	Link
Two Sum	Easy	LeetCode 1
Valid Anagram	Easy	LeetCode 242
Group Anagrams	Medium	LeetCode 49
Top K Frequent Elements	Medium	LeetCode 347
Longest Consecutive Sequence	Medium	LeetCode 128
Subarray Sum Equals K	Medium	LeetCode 560
Destination City	Easy	LeetCode 1436
Contains Duplicate II	Easy	LeetCode 219
Isomorphic Strings	Easy	LeetCode 205

Key Takeaways

dict gives O(1) average lookup, insert, and delete by hashing keys to array indices
Python dicts maintain insertion order (3.7+) — unlike Go maps
defaultdict eliminates check-then-set boilerplate for grouping and counting
Counter is purpose-built for frequencies: one-shot construction, most_common(), arithmetic
Use set for O(1) membership testing and deduplication
frozenset is hashable — use it as a dict key or in other sets
Only hashable (immutable) types can be keys — convert lists to tuples, sets to frozensets
Core patterns: complement lookup (Two Sum), frequency counting, grouping, prefix sum + hash map
If you need order or range queries, a dict is the wrong choice