04. Hash Maps and Sets
📋 Jump to TakeawaysHash maps solve the biggest limitation of lists: finding a value by content instead of by position. Lists need O(n) to search. Dicts do it in O(1). This single property makes them the most-used data structure in real code and interviews alike.
How Hash Maps Work
A hash map stores key-value pairs. You give it a key, it runs a hash function to compute an index, and stores the value at that index in an internal array.
key "alice" → hash("alice") → 3 → store at index 3
key "bob" → hash("bob") → 7 → store at index 7
Lookup "alice" → hash("alice") → 3 → found at index 3, O(1)The hash function converts any key into an integer. A good hash function distributes keys evenly across the array. A bad one puts everything in the same slot (collision), degrading to O(n).
Python's dict
Python's built-in dict is a hash map. Since Python 3.7, dicts maintain insertion order (unlike Go maps):
ages = {
"alice": 30,
"bob": 25,
}
ages["carol"] = 28 # insert — O(1)
age = ages["alice"] # lookup — O(1)
del ages["bob"] # delete — O(1)
# Check existence
if "dave" in ages:
print(ages["dave"])
# Safe access with default
age = ages.get("dave", 0) # returns 0 if key missingOperation Costs
| Operation | Average | Worst |
|---|---|---|
d[key] |
O(1) | O(n) |
d[key] = val |
O(1) | O(n) |
del d[key] |
O(1) | O(n) |
key in d |
O(1) | O(n) |
| Iterate all | O(n) | O(n) |
Worst case happens when all keys hash to the same bucket. In practice with Python's built-in dict, this essentially never happens. Treat it as O(1).
defaultdict — No KeyError
defaultdict auto-creates missing keys with a factory function. Eliminates the "check-then-set" pattern:
from collections import defaultdict
# Without defaultdict — verbose
graph = {}
for u, v in edges:
if u not in graph:
graph[u] = []
graph[u].append(v)
# With defaultdict — clean
graph = defaultdict(list)
for u, v in edges:
graph[u].append(v) # auto-creates empty list if key missingCommon factories:
defaultdict(int)→ missing keys default to 0 (great for counting)defaultdict(list)→ missing keys default to [] (great for grouping)defaultdict(set)→ missing keys default to set()
# Frequency counting with defaultdict
freq = defaultdict(int)
for ch in "hello":
freq[ch] += 1 # no KeyError, starts at 0
# {'h': 1, 'e': 1, 'l': 2, 'o': 1}Counter — Purpose-Built Frequency Map
Counter is a dict subclass optimized for counting. It's the go-to for frequency problems:
from collections import Counter
# Count anything iterable
freq = Counter("aabbbcc") # Counter({'b': 3, 'a': 2, 'c': 2})
freq = Counter([1, 2, 2, 3, 3, 3]) # Counter({3: 3, 2: 2, 1: 1})
# Useful methods
freq.most_common(2) # [(3, 3), (2, 2)] — top 2 by count
freq.total() # sum of all counts (Python 3.10+)
# Arithmetic
a = Counter("aaab")
b = Counter("ab")
print(a - b) # Counter({'a': 2}) — subtract counts, drop zeros
print(a & b) # Counter({'a': 1, 'b': 1}) — min of each
print(a | b) # Counter({'a': 3, 'b': 1}) — max of eachCounter vs defaultdict(int):
- Same O(1) operations
Counteraddsmost_common(), arithmetic, and one-shot construction- Use
Counterfor frequency problems,defaultdict(int)for general counting in loops
See also: Top K Frequent Elements — three approaches beyond
most_common(): sort, min-heap, and O(n) bucket sort.
Sets
A set is a collection of unique values with O(1) membership testing:
seen = set()
seen.add("alice") # add — O(1)
seen.discard("alice") # remove (no error if missing) — O(1)
seen.remove("alice") # remove (KeyError if missing) — O(1)
if "bob" in seen: # membership test — O(1)
print("found")Set Operations
Python sets support mathematical set operations — all O(min(len(a), len(b))) or O(len(a) + len(b)):
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
a | b # union: {1, 2, 3, 4, 5, 6}
a & b # intersection: {3, 4}
a - b # difference: {1, 2}
a ^ b # symmetric difference: {1, 2, 5, 6}
a.issubset(b) # False
a.issuperset(b) # False
a.isdisjoint(b) # False (they share elements)Set Construction from Iterables
# Deduplicate a list (loses order)
unique = set([1, 2, 2, 3, 3]) # {1, 2, 3}
# Deduplicate preserving order
seen = set()
unique_ordered = []
for x in items:
if x not in seen:
seen.add(x)
unique_ordered.append(x)
# Set comprehension
evens = {x for x in range(20) if x % 2 == 0}frozenset — Hashable Sets
Regular sets are mutable, so they can't be dict keys or members of other sets. frozenset is immutable and hashable:
# Use frozenset as a dict key
cache = {}
key = frozenset([1, 2, 3])
cache[key] = "result"
# Set of sets
groups = set()
groups.add(frozenset([1, 2]))
groups.add(frozenset([3, 4]))
# Membership test
frozenset([1, 2]) in groups # TrueUse frozenset when you need to use a set as a key for memoization or deduplication of states (common in BFS/DFS problems).
What Can Be a Key?
Only hashable types can be dict keys or set members. Hashable means immutable with a consistent hash:
| Hashable (can be key) | Not hashable (can't be key) |
|---|---|
int, float, str |
list |
tuple (if contents are hashable) |
dict |
frozenset |
set |
bool, None |
any mutable object |
# Tuple as key — common for coordinate problems
grid = {}
grid[(0, 0)] = "start"
grid[(1, 2)] = "wall"
# Convert list to tuple for hashing
state = [1, 2, 3]
seen = set()
seen.add(tuple(state))Pattern: Two Sum (Complement Lookup)
Store values you've seen so you can find complements in O(1):
LeetCode 1 - Two Sum
def two_sum(nums: list[int], target: int) -> list[int]:
seen = {} # value → index
for i, n in enumerate(nums):
complement = target - n
if complement in seen:
return [seen[complement], i]
seen[n] = i
return []Without a dict, this is O(n²) — check every pair. With a dict, one pass.
Pattern: Frequency Counting
LeetCode 242 - Valid Anagram
from collections import Counter
def is_anagram(s: str, t: str) -> bool:
return Counter(s) == Counter(t)LeetCode 347 - Top K Frequent Elements
from collections import Counter
def top_k_frequent(nums: list[int], k: int) -> list[int]:
return [x for x, _ in Counter(nums).most_common(k)] # most_common returns [(elem, count)], we extract just elementsPattern: Deduplication / Seen Set
LeetCode 128 - Longest Consecutive Sequence
def longest_consecutive(nums: list[int]) -> int:
num_set = set(nums)
best = 0
for n in num_set:
if n - 1 not in num_set: # only start counting from sequence starts
length = 1
while n + length in num_set:
length += 1
best = max(best, length)
return bestO(n) — each number is visited at most twice (once in the outer loop, once in a while chain).
Pattern: Grouping
Group elements by a computed key:
LeetCode 49 - Group Anagrams
from collections import defaultdict
def group_anagrams(strs: list[str]) -> list[list[str]]:
groups = defaultdict(list)
for s in strs:
key = tuple(sorted(s))
groups[key].append(s)
return list(groups.values())Using a frequency tuple as key (avoids O(k log k) sort per word):
def group_anagrams(strs: list[str]) -> list[list[str]]:
groups = defaultdict(list)
for s in strs:
key = [0] * 26
for ch in s:
key[ord(ch) - ord('a')] += 1
groups[tuple(key)].append(s)
return list(groups.values())Pattern: Prefix Sum + Hash Map
Combine prefix sums with a hash map to find subarrays with a given sum:
LeetCode 560 - Subarray Sum Equals K
def subarray_sum(nums: list[int], k: int) -> int:
prefix_count = {0: 1} # prefix_sum → count of times seen
current_sum = 0
result = 0
for n in nums:
current_sum += n
result += prefix_count.get(current_sum - k, 0)
prefix_count[current_sum] = prefix_count.get(current_sum, 0) + 1
return resultWhen to Use Hash Maps
- You need O(1) lookup by key
- Counting frequencies
- Detecting duplicates
- Caching computed results (memoization)
- Mapping relationships (graph adjacency lists)
- Prefix sum problems (sum → count mapping)
When NOT to Use Hash Maps
- You need sorted order → use
SortedDictfromsortedcontainersorbisect - Keys are small integers (0 to n) → use a list (faster, less memory)
- You need range queries ("all keys between 5 and 10") → use a sorted structure
- Memory is extremely tight → dicts have ~50-70 bytes overhead per entry
dict vs defaultdict vs Counter
| Feature | dict |
defaultdict |
Counter |
|---|---|---|---|
| Missing key | KeyError |
Auto-creates | Returns 0 |
| Best for | General mapping | Grouping/building | Frequency counting |
most_common() |
No | No | Yes |
Arithmetic (+, -) |
No | No | Yes |
| Construction from iterable | dict.fromkeys() |
Manual | Counter(iterable) |
Practice Problems
| Problem | Difficulty | Link |
|---|---|---|
| Two Sum | Easy | LeetCode 1 |
| Valid Anagram | Easy | LeetCode 242 |
| Group Anagrams | Medium | LeetCode 49 |
| Top K Frequent Elements | Medium | LeetCode 347 |
| Longest Consecutive Sequence | Medium | LeetCode 128 |
| Subarray Sum Equals K | Medium | LeetCode 560 |
| Destination City | Easy | LeetCode 1436 |
| Contains Duplicate II | Easy | LeetCode 219 |
| Isomorphic Strings | Easy | LeetCode 205 |
Key Takeaways
dictgives O(1) average lookup, insert, and delete by hashing keys to array indices- Python dicts maintain insertion order (3.7+) — unlike Go maps
defaultdicteliminates check-then-set boilerplate for grouping and countingCounteris purpose-built for frequencies: one-shot construction,most_common(), arithmetic- Use
setfor O(1) membership testing and deduplication frozensetis hashable — use it as a dict key or in other sets- Only hashable (immutable) types can be keys — convert lists to tuples, sets to frozensets
- Core patterns: complement lookup (Two Sum), frequency counting, grouping, prefix sum + hash map
- If you need order or range queries, a dict is the wrong choice