06 - Rate Limiting & Throttling

📋 Jump to Takeaways

Rate limiting controls how fast work happens. Without it, you'll overwhelm APIs, databases, or your own system. Go makes this easy with tickers, channels, and the golang.org/x/time/rate package.

Why Rate Limiting

Most external APIs enforce rate limits. Hit them too fast and you get 429 errors or banned. Even internal systems have limits — a database can handle 1,000 queries per second, not 100,000. Rate limiting protects both the services you call and your own infrastructure.

You need rate limiting when:

Calling external APIs with request quotas (Stripe, GitHub, AWS)
Processing a backlog of work that shouldn't overwhelm downstream services
Handling user requests where you want to prevent abuse
Running concurrent workers that share a limited resource

Simple Rate Limiting with time.Ticker

A ticker sends a value at regular intervals. Use it as a gate.

func main() {
    requests := []int{1, 2, 3, 4, 5}
    limiter := time.NewTicker(200 * time.Millisecond) // 5 per second
    defer limiter.Stop()

    for _, req := range requests {
        <-limiter.C // wait for the next tick
        fmt.Printf("request %d at %s\n", req, time.Now().Format("15:04:05.000"))
    }
}

Each request waits for the ticker. Exactly 200ms between requests — 5 requests per second.

Burst Rate Limiting

Sometimes you want to allow a burst of requests, then throttle. Use a buffered channel as a token bucket.

The difference from time.Ticker: with a ticker, even the first request waits. With a buffered channel pre-filled with tokens, the first N requests fire instantly — the buffer size is the burst size. Once the buffer is empty, requests wait for refills at a steady rate, just like the ticker approach.

func main() {
    // Allow bursts of 3, then 1 per 500ms
    bucket := make(chan struct{}, 3)

    // Pre-fill the bucket (initial burst capacity)
    for i := 0; i < 3; i++ {
        bucket <- struct{}{}
    }

    // Refill at a steady rate
    go func() {
        ticker := time.NewTicker(500 * time.Millisecond)
        defer ticker.Stop()
        for range ticker.C {
            select {
            case bucket <- struct{}{}:
            default: // bucket full, skip
            }
        }
    }()

    // Process requests
    for i := 1; i <= 8; i++ {
        <-bucket // take a token
        fmt.Printf("request %d at %s\n", i, time.Now().Format("15:04:05.000"))
    }
}

The first 3 requests fire immediately (burst). After that, one request every 500ms as the bucket refills.

request 1 at 10:00:00.000  ← instant
request 2 at 10:00:00.000  ← instant
request 3 at 10:00:00.000  ← instant (burst exhausted)
request 4 at 10:00:00.500  ← throttled
request 5 at 10:00:01.000  ← throttled
...

golang.org/x/time/rate

The standard library doesn't have a rate limiter, but the official x/time/rate package does. It implements a token bucket — combining both approaches above: a steady rate (like the ticker) with a burst allowance (like the buffered channel). This is what you'd use in production.

go get golang.org/x/time/rate

import "golang.org/x/time/rate"

func main() {
    // 5 events per second, burst of 1
    limiter := rate.NewLimiter(5, 1)

    for i := 0; i < 10; i++ {
        err := limiter.Wait(context.Background()) // blocks until allowed
        if err != nil {
            fmt.Println("error:", err)
            return
        }
        fmt.Printf("request %d at %s\n", i, time.Now().Format("15:04:05.000"))
    }
}

rate.NewLimiter(r, b) — r is events per second, b is burst size.

Three ways to use it:

Method	Behavior
`Wait(ctx)`	Blocks until allowed. Respects context cancellation.
`Allow()`	Returns `true` if allowed right now, `false` otherwise. Non-blocking.
`Reserve()`	Returns a reservation with the delay. You decide what to do.

Rate Limiting Workers

Apply rate limiting to a worker pool. Each worker checks the limiter before processing.

func worker(ctx context.Context, id int, limiter *rate.Limiter, jobs <-chan int, wg *sync.WaitGroup) {
    defer wg.Done()
    for job := range jobs {
        if err := limiter.Wait(ctx); err != nil {
            fmt.Printf("worker %d: rate limit error: %v\n", id, err)
            return
        }
        fmt.Printf("worker %d processing job %d at %s\n", id, job, time.Now().Format("15:04:05.000"))
    }
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    limiter := rate.NewLimiter(2, 1) // 2 per second, burst 1
    jobs := make(chan int, 20)
    var wg sync.WaitGroup

    for i := 0; i < 3; i++ {
        wg.Add(1)
        go worker(ctx, i, limiter, jobs, &wg)
    }

    for j := 0; j < 15; j++ {
        jobs <- j
    }
    close(jobs)

    wg.Wait()
}

All workers share one limiter. Even with 3 workers, total throughput is capped at 2 per second. The limiter is goroutine-safe.

Per-Worker vs Shared Limiter

	Shared limiter	Per-worker limiter
Total throughput	Capped globally	Capped per worker
3 workers, 2/sec each	2/sec total	6/sec total
Use case	API rate limits (global cap)	Fair distribution

For external API rate limits, use a shared limiter — the API doesn't care which worker made the call, it counts total requests. Per-worker limiters are useful when you want fair distribution — preventing one fast worker from starving the others by hogging all the tokens from a shared limiter. If each worker talks to a different resource with its own rate limit, that's a fan-out/fan-in pattern rather than a worker pool.

Non-Blocking Rate Limiting

limiter.Wait(ctx) blocks until a request is allowed — fine for background jobs, but you don't want an HTTP server holding connections open. Use limiter.Allow() to reject immediately instead of queuing.

var limiter = rate.NewLimiter(10, 5) // 10 req/s, burst of 5

func handler(w http.ResponseWriter, r *http.Request) {
    if !limiter.Allow() {
        http.Error(w, "rate limited", http.StatusTooManyRequests)
        return
    }
    fmt.Fprint(w, "OK")
}

If the client exceeds the rate, return 429 immediately. No waiting, no queued connections.

Throttling with time.Ticker in Pipelines

Unlike the earlier examples where rate limiting is inline (inside a loop or worker), this approach is a composable pipeline stage. It takes a channel in, returns a channel out, and gates values at a fixed rate in between. The producer and consumer don't know about the rate limit — you can add or remove it without changing any other code.

func throttle(ctx context.Context, in <-chan int, perSecond int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        // e.g. perSecond=10 → tick every 100ms
        ticker := time.NewTicker(time.Second / time.Duration(perSecond))
        defer ticker.Stop()

        for val := range in {
            select {
            // ticker.C is a channel that receives a value every interval
            // blocks here until the next tick, gating one value per interval
            case <-ticker.C:
                select {
                case out <- val:
                case <-ctx.Done():
                    return
                }
            case <-ctx.Done():
                return
            }
        }
    }()
    return out
}

Use it like any other pipeline stage — plug it between a producer and consumer:

nums := generator(ctx, 20)
throttled := throttle(ctx, nums, 5) // cap at 5 per second

for val := range throttled {
    fmt.Println(val)
}

Dynamic Rate Adjustment

rate.Limiter lets you change the rate on the fly — no need to create a new limiter.

limiter := rate.NewLimiter(10, 1) // start at 10/sec

// Later, slow down
limiter.SetLimit(2) // now 2/sec

// Or speed up
limiter.SetLimit(50) // now 50/sec

// Change burst
limiter.SetBurst(5)

Useful for adaptive systems — slow down when errors increase, speed up when things are healthy.

Key Takeaways

time.Ticker for simple fixed-rate limiting, one event per tick
Buffered channel as a token bucket for burst + steady rate
golang.org/x/time/rate for production use. It combines both approaches into a token bucket
Wait(ctx) blocks until allowed, Allow() returns immediately, Reserve() gives you control
Shared limiter caps total throughput, per-worker limiter caps each worker independently
Use Allow() for HTTP handlers. Drop excess requests with 429
SetLimit() and SetBurst() adjust rates at runtime