02. Lists, Strings and Arrays

📋 Jump to Takeaways

Lists are the most fundamental data structure in Python. Everything else is built on top of them or exists because lists aren't good enough for a specific use case. Understanding how Python's dynamic array works explains why lists are fast for some operations, slow for others, and when to reach for alternatives like collections.deque.

Contiguous Memory

A Python list stores pointers to objects side by side in memory. Element 0's pointer is at address X, element 1's pointer is at X+8 (on 64-bit), element 2's pointer is at X+16. This is why index access is O(1) — the interpreter calculates the exact memory address with simple arithmetic.

nums = [10, 20, 30, 40, 50]
# Memory: [ptr0][ptr1][ptr2][ptr3][ptr4]
#          ↓          ↓
#          10         30  ← jump directly, no scanning

print(nums[2])  # O(1) — direct pointer arithmetic

Unlike C or Go arrays, Python lists store references (pointers), not raw values. This means they can hold mixed types, but each element has pointer overhead.

Python Lists Are Dynamic Arrays

Python's list is a dynamic array — a contiguous block of pointers that grows automatically:

nums = []           # len=0, internal capacity ~0
nums.append(1)     # len=1, capacity grows
nums.append(2)     # len=2
nums.append(3)     # len=3
nums.append(4)     # len=4
nums.append(5)     # capacity exceeded → allocate bigger array, copy

When capacity runs out, Python allocates a new array (~1.125x the size for large lists) and copies all pointers. That's why append is O(1) amortized but occasionally O(n).

import sys

nums = []
for i in range(9):
    nums.append(i)
    print(f"len={len(nums)}, size={sys.getsizeof(nums)} bytes")
# size jumps at certain lengths — that's the reallocation

List Construction Patterns

# Preallocate with a known value
zeros = [0] * 100  # 100 zeros, one allocation

# List comprehension — preferred for transforms
squares = [x * x for x in range(10)]

# Build up with append (when size unknown)
result = []
for item in stream:
    if item > threshold:
        result.append(item)

Rule of thumb: Use list comprehensions when you know the transformation. Use append in a loop when logic is complex or conditional.

Slicing

Slicing creates a new list (shallow copy). Unlike Go slices, mutations don't propagate back.

original = [1, 2, 3, 4, 5]
sub = original[1:3]  # [2, 3] — new list

sub[0] = 99
print(original)  # [1, 2, 3, 4, 5] — unchanged!

"Shallow" means one level deep. Reassigning sub[i] = x is always safe. But if items are mutable objects (lists, dicts, sets) and you mutate them in-place, the source sees it:

a = [[1, 2], [3, 4], [5, 6]]
b = a[1:]           # new list, but same inner lists
b[0] = [99, 99]     # reassign → a unchanged ✓
b[1].append(7)      # mutate in-place → a[2] is now [5, 6, 7] ✗

Rule: both conditions needed to affect the source — item is mutable AND you modify it in-place. Use copy.deepcopy() for full independence.

Slicing costs O(k) where k is the slice length — it copies k pointers.

# Common slicing patterns
nums[:]       # full copy
nums[::-1]    # reversed copy
nums[::2]     # every other element
nums[i:]      # from index i to end
nums[:i]      # from start to index i

Operation Costs

Operation	Time	Why
`nums[i]`	O(1)	Pointer arithmetic
`nums.append(x)`	O(1) amortized	Write to next slot
`nums.pop()`	O(1)	Remove from end
`nums.pop(0)`	O(n)	Shift all elements left
`nums.insert(i, x)`	O(n)	Shift elements right
`del nums[i]`	O(n)	Shift elements left
`x in nums`	O(n)	Linear scan
`nums[i:j]`	O(j-i)	Copy the slice
`nums.sort()`	O(n log n)	Timsort

The critical insight: lists are fast for reading and appending, slow for inserting/deleting anywhere except the end.

collections.deque — O(1) on Both Ends

When you need fast operations on both ends, use deque:

from collections import deque

dq = deque([1, 2, 3])
dq.appendleft(0)  # O(1) — [0, 1, 2, 3]
dq.append(4)      # O(1) — [0, 1, 2, 3, 4]
dq.popleft()      # O(1) — [1, 2, 3, 4]
dq.pop()          # O(1) — [1, 2, 3]

Operation	list	deque
Append right	O(1)	O(1)
Append left	O(n)	O(1)
Pop right	O(1)	O(1)
Pop left	O(n)	O(1)
Index access	O(1)	O(n)

Trade-off: deque gives up O(1) random access. Use it for queues and BFS, not for index-heavy algorithms.

Strings in Python

Strings are immutable sequences of Unicode characters. You cannot change individual characters.

s = "hello"
# s[0] = 'H'  # TypeError: 'str' object does not support item assignment

String concatenation in a loop is O(n²) because each + creates a new string:

# O(n²) — each iteration copies the growing string
result = ""
for word in words:
    result += word

# O(n) — join builds the result in one pass
result = "".join(words)

str.join() is the Python equivalent of Go's strings.Builder. Always use it for building strings from parts.

Converting to a list for mutation:

s = "hello"
chars = list(s)   # ['h', 'e', 'l', 'l', 'o']
chars[0] = 'H'
s = "".join(chars)  # "Hello"

Unicode in Python

Python 3 strings are Unicode by default. len() counts characters (code points), not bytes:

s = "Go🚀"
print(len(s))           # 3 — characters, not bytes
print(len(s.encode()))  # 6 — UTF-8 bytes

Iteration gives you characters directly — no rune/byte distinction like Go:

for ch in "Go🚀":
    print(ch)  # G, o, 🚀

For ASCII-only problems (most LeetCode string problems), you can use ord() and chr() for character arithmetic:

ord('a')       # 97
chr(97)        # 'a'
ord('c') - ord('a')  # 2 — index into a 26-element array

Frequency Counting

Counting character frequencies is a core pattern. For lowercase English letters, a list of 26 is fastest. For general use, collections.Counter is idiomatic.

LeetCode 242 - Valid Anagram

def is_anagram(s: str, t: str) -> bool:
    if len(s) != len(t):
        return False
    freq = [0] * 26
    for i in range(len(s)):
        freq[ord(s[i]) - ord('a')] += 1
        freq[ord(t[i]) - ord('a')] -= 1
    return all(c == 0 for c in freq)

Using Counter (cleaner, same O(n) complexity):

from collections import Counter

def is_anagram(s: str, t: str) -> bool:
    return Counter(s) == Counter(t)

Pattern: Prefix Sum

Precompute cumulative sums so any range sum is O(1):

def prefix_sum(nums: list[int]) -> list[int]:
    prefix = [0] * (len(nums) + 1)
    for i in range(len(nums)):
        prefix[i + 1] = prefix[i] + nums[i]
    return prefix

# Sum of nums[l..r] inclusive = prefix[r+1] - prefix[l]

LeetCode 303 - Range Sum Query Immutable

class NumArray:
    def __init__(self, nums: list[int]):
        self.prefix = [0] * (len(nums) + 1)
        for i in range(len(nums)):
            self.prefix[i + 1] = self.prefix[i] + nums[i]

    def sum_range(self, left: int, right: int) -> int:
        return self.prefix[right + 1] - self.prefix[left]

Pattern: Frequency Counting with Counter

Counter from collections gives you a frequency map in one call:

from collections import Counter

# Top K frequent elements
def top_k(nums: list[int], k: int) -> list[int]:
    count = Counter(nums)
    return [x for x, _ in count.most_common(k)]

Counter supports arithmetic:

a = Counter("aabbc")  # {'a': 2, 'b': 2, 'c': 1}
b = Counter("abc")    # {'a': 1, 'b': 1, 'c': 1}
print(a - b)          # Counter({'a': 1, 'b': 1}) — subtract counts

Beyond Counter — Top-K alternatives:

Counter.most_common(k) works but is O(n log n) internally (it sorts). For interviews, knowing faster alternatives shows depth:

Approach	Time	When to use
`Counter.most_common(k)`	O(n log n)	Quick and clean, fine for most cases
Min-heap of size k	O(n log k)	Streaming data or very large n
Bucket sort	O(n)	Optimal — frequency is bounded by n

The bucket sort approach is often the best interview answer — O(n) with no imports:

def top_k_bucket(nums: list[int], k: int) -> list[int]:
    freq = {}
    for n in nums:
        freq[n] = freq.get(n, 0) + 1

    # index = frequency, value = numbers with that frequency
    buckets = [[] for _ in range(len(nums) + 1)]
    for num, count in freq.items():
        buckets[count].append(num)

    result = []
    for i in range(len(buckets) - 1, -1, -1):
        for num in buckets[i]:
            result.append(num)
            if len(result) == k:
                return result

Without Counter, frequency counting is just a dict:

freq = {}
for n in nums:
    freq[n] = freq.get(n, 0) + 1

Pattern: Read/Write Pointer

Scan with one pointer, write with another. Compacts in-place.

LeetCode 26 - Remove Duplicates from Sorted Array

def remove_duplicates(nums: list[int]) -> int:
    if not nums:
        return 0
    w = 1
    for r in range(1, len(nums)):
        if nums[r] != nums[r - 1]:
            nums[w] = nums[r]
            w += 1
    return w

LeetCode 283 - Move Zeroes

def move_zeroes(nums: list[int]) -> None:
    w = 0
    for r in range(len(nums)):
        if nums[r] != 0:
            nums[w], nums[r] = nums[r], nums[w]
            w += 1

Pattern: Rotate Array

Three reverses, O(n) time, O(1) space:

def rotate(nums: list[int], k: int) -> None:
    k %= len(nums)
    nums.reverse()
    nums[:k] = nums[:k][::-1]
    nums[k:] = nums[k:][::-1]
    # Alt: [::-1] creates a copy. Use nums.reverse() for in-place full reverse.
    # Both work here since slice assignment writes back in-place.

LeetCode 88 - Merge Sorted Array

Fill from the end to avoid overwriting:

def merge(nums1: list[int], m: int, nums2: list[int], n: int) -> None:
    i, j, k = m - 1, n - 1, m + n - 1
    while j >= 0:
        if i >= 0 and nums1[i] > nums2[j]:
            nums1[k] = nums1[i]
            i -= 1
        else:
            nums1[k] = nums2[j]
            j -= 1
        k -= 1

When to Use Lists

You need O(1) random access by index
Data grows at the end (append-heavy)
You're doing sliding window or two-pointer problems
List comprehensions make the transform readable

When NOT to Use Lists

Frequent insertions/deletions at the front → collections.deque
Need O(1) lookup by value → set or dict
Need sorted order with fast insert → bisect module or SortedList
Need a queue (FIFO) → collections.deque

Practice Problems

Problem	Difficulty	Link
Two Sum	Easy	LeetCode 1
Valid Anagram	Easy	LeetCode 242
Merge Sorted Array	Easy	LeetCode 88
Remove Duplicates from Sorted Array	Easy	LeetCode 26
Move Zeroes	Easy	LeetCode 283
Best Time to Buy and Sell Stock	Easy	LeetCode 121
Rotate Array	Medium	LeetCode 189
Maximum Subarray	Medium	LeetCode 53
Product of Array Except Self	Medium	LeetCode 238
Range Sum Query - Immutable	Easy	LeetCode 303
Subarray Sum Equals K	Medium	LeetCode 560

What About Actual Arrays?

Python has no fixed-size array like Go or C. When a problem says "array," use list. But Python does have typed array options:

Type	Memory per int	Use case
`list`	~28 bytes (pointer + object)	Everything — your default
`array.array`	4-8 bytes (raw C value)	Compact numeric storage without NumPy
`numpy.ndarray`	4-8 bytes (contiguous)	Math, vectorized operations, ML

import array

# Standard list: each int is a full Python object on the heap
nums_list = [1, 2, 3, 4, 5]  # ~28 bytes per element

# array.array: raw C integers packed together
nums_arr = array.array('i', [1, 2, 3, 4, 5])  # 4 bytes per element

For interviews and DSA, always use list. Nobody expects array.array. It exists for niche cases where you need millions of numbers with minimal memory and don't want NumPy.

Key Takeaways

Python lists are dynamic arrays of pointers — O(1) access, O(1) amortized append
Slicing creates a new list (shallow copy) — unlike Go, no shared backing array
list.pop(0) and list.insert(0, x) are O(n) — use deque for O(1) on both ends
Strings are immutable. Use "".join() for concatenation, never += in a loop
len(s) counts characters in Python 3, not bytes — no rune/byte distinction
Counter is the idiomatic way to count frequencies; [0]*26 is faster for ASCII problems
Prefix sum, read/write pointer, and frequency counting are the core array patterns
Know when to switch: deque for queues, set for membership, bisect for sorted insertion