02. Lists, Strings and Arrays

📋 Jump to Takeaways

Lists are the most fundamental data structure in Python. Everything else is built on top of them or exists because lists aren't good enough for a specific use case. Understanding how Python's dynamic array works explains why lists are fast for some operations, slow for others, and when to reach for alternatives like collections.deque.

Contiguous Memory

A Python list stores pointers to objects side by side in memory. Element 0's pointer is at address X, element 1's pointer is at X+8 (on 64-bit), element 2's pointer is at X+16. This is why index access is O(1) — the interpreter calculates the exact memory address with simple arithmetic.

nums = [10, 20, 30, 40, 50]
# Memory: [ptr0][ptr1][ptr2][ptr3][ptr4]
#          ↓          ↓
#          10         30  ← jump directly, no scanning

print(nums[2])  # O(1) — direct pointer arithmetic

Unlike C or Go arrays, Python lists store references (pointers), not raw values. This means they can hold mixed types, but each element has pointer overhead.

Python Lists Are Dynamic Arrays

Python's list is a dynamic array — a contiguous block of pointers that grows automatically:

nums = []           # len=0, internal capacity ~0
nums.append(1)     # len=1, capacity grows
nums.append(2)     # len=2
nums.append(3)     # len=3
nums.append(4)     # len=4
nums.append(5)     # capacity exceeded → allocate bigger array, copy

When capacity runs out, Python allocates a new array (~1.125x the size for large lists) and copies all pointers. That's why append is O(1) amortized but occasionally O(n).

import sys

nums = []
for i in range(9):
    nums.append(i)
    print(f"len={len(nums)}, size={sys.getsizeof(nums)} bytes")
# size jumps at certain lengths — that's the reallocation

List Construction Patterns

# Preallocate with a known value
zeros = [0] * 100  # 100 zeros, one allocation

# List comprehension — preferred for transforms
squares = [x * x for x in range(10)]

# Build up with append (when size unknown)
result = []
for item in stream:
    if item > threshold:
        result.append(item)

Rule of thumb: Use list comprehensions when you know the transformation. Use append in a loop when logic is complex or conditional.

Slicing

Slicing creates a new list (shallow copy). Unlike Go slices, mutations don't propagate back.

original = [1, 2, 3, 4, 5]
sub = original[1:3]  # [2, 3] — new list

sub[0] = 99
print(original)  # [1, 2, 3, 4, 5] — unchanged!

"Shallow" means one level deep. Reassigning sub[i] = x is always safe. But if items are mutable objects (lists, dicts, sets) and you mutate them in-place, the source sees it:

a = [[1, 2], [3, 4], [5, 6]]
b = a[1:]           # new list, but same inner lists
b[0] = [99, 99]     # reassign → a unchanged ✓
b[1].append(7)      # mutate in-place → a[2] is now [5, 6, 7] ✗

Rule: both conditions needed to affect the source — item is mutable AND you modify it in-place. Use copy.deepcopy() for full independence.

Slicing costs O(k) where k is the slice length — it copies k pointers.

# Common slicing patterns
nums[:]       # full copy
nums[::-1]    # reversed copy
nums[::2]     # every other element
nums[i:]      # from index i to end
nums[:i]      # from start to index i

Operation Costs

Operation Time Why
nums[i] O(1) Pointer arithmetic
nums.append(x) O(1) amortized Write to next slot
nums.pop() O(1) Remove from end
nums.pop(0) O(n) Shift all elements left
nums.insert(i, x) O(n) Shift elements right
del nums[i] O(n) Shift elements left
x in nums O(n) Linear scan
nums[i:j] O(j-i) Copy the slice
nums.sort() O(n log n) Timsort

The critical insight: lists are fast for reading and appending, slow for inserting/deleting anywhere except the end.

collections.deque — O(1) on Both Ends

When you need fast operations on both ends, use deque:

from collections import deque

dq = deque([1, 2, 3])
dq.appendleft(0)  # O(1) — [0, 1, 2, 3]
dq.append(4)      # O(1) — [0, 1, 2, 3, 4]
dq.popleft()      # O(1) — [1, 2, 3, 4]
dq.pop()          # O(1) — [1, 2, 3]
Operation list deque
Append right O(1) O(1)
Append left O(n) O(1)
Pop right O(1) O(1)
Pop left O(n) O(1)
Index access O(1) O(n)

Trade-off: deque gives up O(1) random access. Use it for queues and BFS, not for index-heavy algorithms.

Strings in Python

Strings are immutable sequences of Unicode characters. You cannot change individual characters.

s = "hello"
# s[0] = 'H'  # TypeError: 'str' object does not support item assignment

String concatenation in a loop is O(n²) because each + creates a new string:

# O(n²) — each iteration copies the growing string
result = ""
for word in words:
    result += word

# O(n) — join builds the result in one pass
result = "".join(words)

str.join() is the Python equivalent of Go's strings.Builder. Always use it for building strings from parts.

Converting to a list for mutation:

s = "hello"
chars = list(s)   # ['h', 'e', 'l', 'l', 'o']
chars[0] = 'H'
s = "".join(chars)  # "Hello"

Unicode in Python

Python 3 strings are Unicode by default. len() counts characters (code points), not bytes:

s = "Go🚀"
print(len(s))           # 3 — characters, not bytes
print(len(s.encode()))  # 6 — UTF-8 bytes

Iteration gives you characters directly — no rune/byte distinction like Go:

for ch in "Go🚀":
    print(ch)  # G, o, 🚀

For ASCII-only problems (most LeetCode string problems), you can use ord() and chr() for character arithmetic:

ord('a')       # 97
chr(97)        # 'a'
ord('c') - ord('a')  # 2 — index into a 26-element array

Frequency Counting

Counting character frequencies is a core pattern. For lowercase English letters, a list of 26 is fastest. For general use, collections.Counter is idiomatic.

LeetCode 242 - Valid Anagram

def is_anagram(s: str, t: str) -> bool:
    if len(s) != len(t):
        return False
    freq = [0] * 26
    for i in range(len(s)):
        freq[ord(s[i]) - ord('a')] += 1
        freq[ord(t[i]) - ord('a')] -= 1
    return all(c == 0 for c in freq)

Using Counter (cleaner, same O(n) complexity):

from collections import Counter

def is_anagram(s: str, t: str) -> bool:
    return Counter(s) == Counter(t)

Pattern: Prefix Sum

Precompute cumulative sums so any range sum is O(1):

def prefix_sum(nums: list[int]) -> list[int]:
    prefix = [0] * (len(nums) + 1)
    for i in range(len(nums)):
        prefix[i + 1] = prefix[i] + nums[i]
    return prefix

# Sum of nums[l..r] inclusive = prefix[r+1] - prefix[l]

LeetCode 303 - Range Sum Query Immutable

class NumArray:
    def __init__(self, nums: list[int]):
        self.prefix = [0] * (len(nums) + 1)
        for i in range(len(nums)):
            self.prefix[i + 1] = self.prefix[i] + nums[i]

    def sum_range(self, left: int, right: int) -> int:
        return self.prefix[right + 1] - self.prefix[left]

Pattern: Frequency Counting with Counter

Counter from collections gives you a frequency map in one call:

from collections import Counter

# Top K frequent elements
def top_k(nums: list[int], k: int) -> list[int]:
    count = Counter(nums)
    return [x for x, _ in count.most_common(k)]

Counter supports arithmetic:

a = Counter("aabbc")  # {'a': 2, 'b': 2, 'c': 1}
b = Counter("abc")    # {'a': 1, 'b': 1, 'c': 1}
print(a - b)          # Counter({'a': 1, 'b': 1}) — subtract counts

Beyond Counter — Top-K alternatives:

Counter.most_common(k) works but is O(n log n) internally (it sorts). For interviews, knowing faster alternatives shows depth:

Approach Time When to use
Counter.most_common(k) O(n log n) Quick and clean, fine for most cases
Min-heap of size k O(n log k) Streaming data or very large n
Bucket sort O(n) Optimal — frequency is bounded by n

The bucket sort approach is often the best interview answer — O(n) with no imports:

def top_k_bucket(nums: list[int], k: int) -> list[int]:
    freq = {}
    for n in nums:
        freq[n] = freq.get(n, 0) + 1

    # index = frequency, value = numbers with that frequency
    buckets = [[] for _ in range(len(nums) + 1)]
    for num, count in freq.items():
        buckets[count].append(num)

    result = []
    for i in range(len(buckets) - 1, -1, -1):
        for num in buckets[i]:
            result.append(num)
            if len(result) == k:
                return result

Without Counter, frequency counting is just a dict:

freq = {}
for n in nums:
    freq[n] = freq.get(n, 0) + 1

Pattern: Read/Write Pointer

Scan with one pointer, write with another. Compacts in-place.

LeetCode 26 - Remove Duplicates from Sorted Array

def remove_duplicates(nums: list[int]) -> int:
    if not nums:
        return 0
    w = 1
    for r in range(1, len(nums)):
        if nums[r] != nums[r - 1]:
            nums[w] = nums[r]
            w += 1
    return w

LeetCode 283 - Move Zeroes

def move_zeroes(nums: list[int]) -> None:
    w = 0
    for r in range(len(nums)):
        if nums[r] != 0:
            nums[w], nums[r] = nums[r], nums[w]
            w += 1

Pattern: Rotate Array

Three reverses, O(n) time, O(1) space:

def rotate(nums: list[int], k: int) -> None:
    k %= len(nums)
    nums.reverse()
    nums[:k] = nums[:k][::-1]
    nums[k:] = nums[k:][::-1]
    # Alt: [::-1] creates a copy. Use nums.reverse() for in-place full reverse.
    # Both work here since slice assignment writes back in-place.

LeetCode 88 - Merge Sorted Array

Fill from the end to avoid overwriting:

def merge(nums1: list[int], m: int, nums2: list[int], n: int) -> None:
    i, j, k = m - 1, n - 1, m + n - 1
    while j >= 0:
        if i >= 0 and nums1[i] > nums2[j]:
            nums1[k] = nums1[i]
            i -= 1
        else:
            nums1[k] = nums2[j]
            j -= 1
        k -= 1

When to Use Lists

  • You need O(1) random access by index
  • Data grows at the end (append-heavy)
  • You're doing sliding window or two-pointer problems
  • List comprehensions make the transform readable

When NOT to Use Lists

  • Frequent insertions/deletions at the front → collections.deque
  • Need O(1) lookup by value → set or dict
  • Need sorted order with fast insert → bisect module or SortedList
  • Need a queue (FIFO) → collections.deque

Practice Problems

Problem Difficulty Link
Two Sum Easy LeetCode 1
Valid Anagram Easy LeetCode 242
Merge Sorted Array Easy LeetCode 88
Remove Duplicates from Sorted Array Easy LeetCode 26
Move Zeroes Easy LeetCode 283
Best Time to Buy and Sell Stock Easy LeetCode 121
Rotate Array Medium LeetCode 189
Maximum Subarray Medium LeetCode 53
Product of Array Except Self Medium LeetCode 238
Range Sum Query - Immutable Easy LeetCode 303
Subarray Sum Equals K Medium LeetCode 560

What About Actual Arrays?

Python has no fixed-size array like Go or C. When a problem says "array," use list. But Python does have typed array options:

Type Memory per int Use case
list ~28 bytes (pointer + object) Everything — your default
array.array 4-8 bytes (raw C value) Compact numeric storage without NumPy
numpy.ndarray 4-8 bytes (contiguous) Math, vectorized operations, ML
import array

# Standard list: each int is a full Python object on the heap
nums_list = [1, 2, 3, 4, 5]  # ~28 bytes per element

# array.array: raw C integers packed together
nums_arr = array.array('i', [1, 2, 3, 4, 5])  # 4 bytes per element

For interviews and DSA, always use list. Nobody expects array.array. It exists for niche cases where you need millions of numbers with minimal memory and don't want NumPy.

Key Takeaways

  • Python lists are dynamic arrays of pointers — O(1) access, O(1) amortized append
  • Slicing creates a new list (shallow copy) — unlike Go, no shared backing array
  • list.pop(0) and list.insert(0, x) are O(n) — use deque for O(1) on both ends
  • Strings are immutable. Use "".join() for concatenation, never += in a loop
  • len(s) counts characters in Python 3, not bytes — no rune/byte distinction
  • Counter is the idiomatic way to count frequencies; [0]*26 is faster for ASCII problems
  • Prefix sum, read/write pointer, and frequency counting are the core array patterns
  • Know when to switch: deque for queues, set for membership, bisect for sorted insertion

📝 Ready to test your knowledge?

Answer the quiz below to mark this lesson complete.

Spot something off? Report an issue

© 2026 ByteLearn.dev. Free courses for developers. · Privacy