04. Hash Maps and Sets

📋 Jump to Takeaways

Hash maps solve the biggest limitation of lists: finding a value by content instead of by position. Lists need O(n) to search. Dicts do it in O(1). This single property makes them the most-used data structure in real code and interviews alike.

How Hash Maps Work

A hash map stores key-value pairs. You give it a key, it runs a hash function to compute an index, and stores the value at that index in an internal array.

key "alice" → hash("alice") → 3 → store at index 3
key "bob"   → hash("bob")   → 7 → store at index 7

Lookup "alice" → hash("alice") → 3 → found at index 3, O(1)

The hash function converts any key into an integer. A good hash function distributes keys evenly across the array. A bad one puts everything in the same slot (collision), degrading to O(n).

Python's dict

Python's built-in dict is a hash map. Since Python 3.7, dicts maintain insertion order (unlike Go maps):

ages = {
    "alice": 30,
    "bob": 25,
}

ages["carol"] = 28       # insert — O(1)
age = ages["alice"]      # lookup — O(1)
del ages["bob"]          # delete — O(1)

# Check existence
if "dave" in ages:
    print(ages["dave"])

# Safe access with default
age = ages.get("dave", 0)  # returns 0 if key missing

Operation Costs

Operation Average Worst
d[key] O(1) O(n)
d[key] = val O(1) O(n)
del d[key] O(1) O(n)
key in d O(1) O(n)
Iterate all O(n) O(n)

Worst case happens when all keys hash to the same bucket. In practice with Python's built-in dict, this essentially never happens. Treat it as O(1).

defaultdict — No KeyError

defaultdict auto-creates missing keys with a factory function. Eliminates the "check-then-set" pattern:

from collections import defaultdict

# Without defaultdict — verbose
graph = {}
for u, v in edges:
    if u not in graph:
        graph[u] = []
    graph[u].append(v)

# With defaultdict — clean
graph = defaultdict(list)
for u, v in edges:
    graph[u].append(v)  # auto-creates empty list if key missing

Common factories:

  • defaultdict(int) → missing keys default to 0 (great for counting)
  • defaultdict(list) → missing keys default to [] (great for grouping)
  • defaultdict(set) → missing keys default to set()
# Frequency counting with defaultdict
freq = defaultdict(int)
for ch in "hello":
    freq[ch] += 1  # no KeyError, starts at 0
# {'h': 1, 'e': 1, 'l': 2, 'o': 1}

Counter — Purpose-Built Frequency Map

Counter is a dict subclass optimized for counting. It's the go-to for frequency problems:

from collections import Counter

# Count anything iterable
freq = Counter("aabbbcc")  # Counter({'b': 3, 'a': 2, 'c': 2})
freq = Counter([1, 2, 2, 3, 3, 3])  # Counter({3: 3, 2: 2, 1: 1})

# Useful methods
freq.most_common(2)  # [(3, 3), (2, 2)] — top 2 by count
freq.total()         # sum of all counts (Python 3.10+)

# Arithmetic
a = Counter("aaab")
b = Counter("ab")
print(a - b)    # Counter({'a': 2}) — subtract counts, drop zeros
print(a & b)    # Counter({'a': 1, 'b': 1}) — min of each
print(a | b)    # Counter({'a': 3, 'b': 1}) — max of each

Counter vs defaultdict(int):

  • Same O(1) operations
  • Counter adds most_common(), arithmetic, and one-shot construction
  • Use Counter for frequency problems, defaultdict(int) for general counting in loops

See also: Top K Frequent Elements — three approaches beyond most_common(): sort, min-heap, and O(n) bucket sort.

Sets

A set is a collection of unique values with O(1) membership testing:

seen = set()

seen.add("alice")       # add — O(1)
seen.discard("alice")   # remove (no error if missing) — O(1)
seen.remove("alice")    # remove (KeyError if missing) — O(1)

if "bob" in seen:       # membership test — O(1)
    print("found")

Set Operations

Python sets support mathematical set operations — all O(min(len(a), len(b))) or O(len(a) + len(b)):

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

a | b   # union:        {1, 2, 3, 4, 5, 6}
a & b   # intersection: {3, 4}
a - b   # difference:   {1, 2}
a ^ b   # symmetric difference: {1, 2, 5, 6}

a.issubset(b)     # False
a.issuperset(b)   # False
a.isdisjoint(b)   # False (they share elements)

Set Construction from Iterables

# Deduplicate a list (loses order)
unique = set([1, 2, 2, 3, 3])  # {1, 2, 3}

# Deduplicate preserving order
seen = set()
unique_ordered = []
for x in items:
    if x not in seen:
        seen.add(x)
        unique_ordered.append(x)

# Set comprehension
evens = {x for x in range(20) if x % 2 == 0}

frozenset — Hashable Sets

Regular sets are mutable, so they can't be dict keys or members of other sets. frozenset is immutable and hashable:

# Use frozenset as a dict key
cache = {}
key = frozenset([1, 2, 3])
cache[key] = "result"

# Set of sets
groups = set()
groups.add(frozenset([1, 2]))
groups.add(frozenset([3, 4]))

# Membership test
frozenset([1, 2]) in groups  # True

Use frozenset when you need to use a set as a key for memoization or deduplication of states (common in BFS/DFS problems).

What Can Be a Key?

Only hashable types can be dict keys or set members. Hashable means immutable with a consistent hash:

Hashable (can be key) Not hashable (can't be key)
int, float, str list
tuple (if contents are hashable) dict
frozenset set
bool, None any mutable object
# Tuple as key — common for coordinate problems
grid = {}
grid[(0, 0)] = "start"
grid[(1, 2)] = "wall"

# Convert list to tuple for hashing
state = [1, 2, 3]
seen = set()
seen.add(tuple(state))

Pattern: Two Sum (Complement Lookup)

Store values you've seen so you can find complements in O(1):

LeetCode 1 - Two Sum

def two_sum(nums: list[int], target: int) -> list[int]:
    seen = {}  # value → index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

Without a dict, this is O(n²) — check every pair. With a dict, one pass.

Pattern: Frequency Counting

LeetCode 242 - Valid Anagram

from collections import Counter

def is_anagram(s: str, t: str) -> bool:
    return Counter(s) == Counter(t)

LeetCode 347 - Top K Frequent Elements

from collections import Counter

def top_k_frequent(nums: list[int], k: int) -> list[int]:
    return [x for x, _ in Counter(nums).most_common(k)]  # most_common returns [(elem, count)], we extract just elements

Pattern: Deduplication / Seen Set

LeetCode 128 - Longest Consecutive Sequence

def longest_consecutive(nums: list[int]) -> int:
    num_set = set(nums)
    best = 0
    for n in num_set:
        if n - 1 not in num_set:  # only start counting from sequence starts
            length = 1
            while n + length in num_set:
                length += 1
            best = max(best, length)
    return best

O(n) — each number is visited at most twice (once in the outer loop, once in a while chain).

Pattern: Grouping

Group elements by a computed key:

LeetCode 49 - Group Anagrams

from collections import defaultdict

def group_anagrams(strs: list[str]) -> list[list[str]]:
    groups = defaultdict(list)
    for s in strs:
        key = tuple(sorted(s))
        groups[key].append(s)
    return list(groups.values())

Using a frequency tuple as key (avoids O(k log k) sort per word):

def group_anagrams(strs: list[str]) -> list[list[str]]:
    groups = defaultdict(list)
    for s in strs:
        key = [0] * 26
        for ch in s:
            key[ord(ch) - ord('a')] += 1
        groups[tuple(key)].append(s)
    return list(groups.values())

Pattern: Prefix Sum + Hash Map

Combine prefix sums with a hash map to find subarrays with a given sum:

LeetCode 560 - Subarray Sum Equals K

def subarray_sum(nums: list[int], k: int) -> int:
    prefix_count = {0: 1}  # prefix_sum → count of times seen
    current_sum = 0
    result = 0
    for n in nums:
        current_sum += n
        result += prefix_count.get(current_sum - k, 0)
        prefix_count[current_sum] = prefix_count.get(current_sum, 0) + 1
    return result

When to Use Hash Maps

  • You need O(1) lookup by key
  • Counting frequencies
  • Detecting duplicates
  • Caching computed results (memoization)
  • Mapping relationships (graph adjacency lists)
  • Prefix sum problems (sum → count mapping)

When NOT to Use Hash Maps

  • You need sorted order → use SortedDict from sortedcontainers or bisect
  • Keys are small integers (0 to n) → use a list (faster, less memory)
  • You need range queries ("all keys between 5 and 10") → use a sorted structure
  • Memory is extremely tight → dicts have ~50-70 bytes overhead per entry

dict vs defaultdict vs Counter

Feature dict defaultdict Counter
Missing key KeyError Auto-creates Returns 0
Best for General mapping Grouping/building Frequency counting
most_common() No No Yes
Arithmetic (+, -) No No Yes
Construction from iterable dict.fromkeys() Manual Counter(iterable)

Practice Problems

Problem Difficulty Link
Two Sum Easy LeetCode 1
Valid Anagram Easy LeetCode 242
Group Anagrams Medium LeetCode 49
Top K Frequent Elements Medium LeetCode 347
Longest Consecutive Sequence Medium LeetCode 128
Subarray Sum Equals K Medium LeetCode 560
Destination City Easy LeetCode 1436
Contains Duplicate II Easy LeetCode 219
Isomorphic Strings Easy LeetCode 205

Key Takeaways

  • dict gives O(1) average lookup, insert, and delete by hashing keys to array indices
  • Python dicts maintain insertion order (3.7+) — unlike Go maps
  • defaultdict eliminates check-then-set boilerplate for grouping and counting
  • Counter is purpose-built for frequencies: one-shot construction, most_common(), arithmetic
  • Use set for O(1) membership testing and deduplication
  • frozenset is hashable — use it as a dict key or in other sets
  • Only hashable (immutable) types can be keys — convert lists to tuples, sets to frozensets
  • Core patterns: complement lookup (Two Sum), frequency counting, grouping, prefix sum + hash map
  • If you need order or range queries, a dict is the wrong choice

📖 Examples

Complete examples for this lesson.

📝 Ready to test your knowledge?

Answer the quiz below to mark this lesson complete.

Spot something off? Report an issue

© 2026 ByteLearn.dev. Free courses for developers. · Privacy