13 - Graceful Shutdown

📋 Jump to Takeaways

Hit Ctrl+C on your server and it dies instantly. Any in-flight request gets a broken connection. Any database transaction gets rolled back by the server, not your code. Any buffered log entry disappears. In development, who cares. In production, this corrupts data and confuses users.

Graceful shutdown means: stop accepting new connections, finish what you're doing, clean up, then exit.

The Problem

http.ListenAndServe blocks until it returns an error. When you kill the process, the OS sends SIGTERM (or SIGINT for Ctrl+C). The default behavior is immediate termination. No cleanup. No goodbye.

// This is what we've been doing. It works, but it's not production-ready.
log.Fatal(http.ListenAndServe(":8080", mux))

Catching Signals

The os/signal package lets you intercept OS signals before they kill your process:

ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer stop()

signal.NotifyContext returns a context that cancels when the signal arrives. os.Interrupt is Ctrl+C (SIGINT). syscall.SIGTERM is what Docker, Kubernetes, and systemd send when they want your process to stop.

http.Server.Shutdown

http.Server has a Shutdown method that does exactly what we need:

Closes all listeners (stops accepting new connections)
Waits for all active requests to complete
Returns when everything is done

srv := &http.Server{
    Addr:    ":8080",
    Handler: mux,
}

// Shutdown takes a context. If the context expires, it forces shutdown.
err := srv.Shutdown(ctx)

The context gives you a deadline. If requests don't finish in time, the shutdown is forced. You don't want to wait forever for a stuck client.

The Complete Pattern

Here's the full main() with graceful shutdown:

func main() {
    cfg := loadConfig()
    db := openDB(cfg.DatabaseURL)
    store := NewBookmarkStore(db)

    mux := http.NewServeMux()
    registerRoutes(mux, store)

    srv := &http.Server{
        Addr:    ":" + cfg.Port,
        Handler: mux,
    }

    // Start server in a goroutine
    go func() {
        slog.Info("server starting", "addr", srv.Addr)
        if err := srv.ListenAndServe(); err != http.ErrServerClosed {
            slog.Error("server error", "error", err)
            os.Exit(1)
        }
    }()

    // Wait for interrupt signal
    ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
    defer stop()
    <-ctx.Done()

    // Shutdown with a timeout
    slog.Info("shutting down")
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    // Second signal forces immediate exit
    go func() {
        <-ctx.Done() // stop() was deferred, so re-register
        sig := make(chan os.Signal, 1)
        signal.Notify(sig, os.Interrupt, syscall.SIGTERM)
        <-sig
        slog.Warn("forced shutdown")
        os.Exit(1)
    }()

    if err := srv.Shutdown(shutdownCtx); err != nil {
        slog.Error("shutdown error", "error", err)
    }

    // Clean up resources
    db.Close()
    slog.Info("server stopped")
}

Walk through it:

Start the HTTP server in a goroutine so main can continue
ListenAndServe returns http.ErrServerClosed when Shutdown is called. That's expected, not an error
Block on <-ctx.Done(), which fires when SIGINT or SIGTERM arrives
Create a new context with a 10-second timeout for the shutdown itself
A goroutine listens for a second signal — if the user hits Ctrl+C again during shutdown, it force-exits immediately
Call srv.Shutdown, which drains active connections
Close the database connection
Exit cleanly

Why a Separate Shutdown Context?

The signal context (ctx) is already cancelled when we reach the shutdown code. We need a fresh context with its own deadline. The 10-second timeout is a safety net. If a request is stuck (slow client, deadlocked handler), we don't wait forever.

shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

Pick a timeout that makes sense for your app. 10 seconds is reasonable for most APIs. If you have long-running uploads or WebSocket connections, you might need more.

Cleaning Up Resources

Shutdown isn't just about HTTP. Close everything in reverse order of creation:

// After srv.Shutdown returns:
db.Close()          // close database connections

If you have other resources (message queues, file handles, background workers), close them here too. The pattern is always the same: stop accepting work, finish current work, release resources.

Testing Shutdown

You can test the shutdown behavior by sending a signal to your own process:

# Start the server
go run ./api &

# Send SIGTERM
kill -TERM $!

# Or just Ctrl+C in the foreground

You should see the "shutting down" and "server stopped" log messages. If you have a slow request in flight, it should complete before the server exits (up to the timeout).

Applying to Our Project

Replace the log.Fatal(http.ListenAndServe(...)) in main.go with the full shutdown pattern:

package main

import (
    "context"
    "log/slog"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    cfg := loadConfig()

    db, err := openDB(cfg.DatabaseURL)
    if err != nil {
        slog.Error("open db", "error", err)
        os.Exit(1)
    }

    store := NewBookmarkStore(db)
    mux := http.NewServeMux()
    registerRoutes(mux, store)

    handler := chainMiddleware(mux,
        Recovery,
        RequestID,
        Logger,
    )

    srv := &http.Server{
        Addr:    ":" + cfg.Port,
        Handler: handler,
    }

    go func() {
        slog.Info("server starting", "addr", srv.Addr)
        if err := srv.ListenAndServe(); err != http.ErrServerClosed {
            slog.Error("server error", "error", err)
            os.Exit(1)
        }
    }()

    ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
    defer stop()
    <-ctx.Done()

    slog.Info("shutting down")
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    if err := srv.Shutdown(shutdownCtx); err != nil {
        slog.Error("shutdown error", "error", err)
    }

    db.Close()
    slog.Info("server stopped")
}

This is production-ready. Kubernetes sends SIGTERM, your server drains connections, closes the database, and exits with code 0. The pod restarts cleanly. No dropped requests.

Beyond HTTP: Shutting Down Background Work

Our example only shuts down an HTTP server and a database connection. Real applications often have more to clean up: background jobs, message queue consumers, async publishers, cache connections.

The pattern is the same — listen for the signal, then shut things down in order:

// 1. Stop accepting new work
srv.Shutdown(shutdownCtx)

// 2. Cancel in-flight background jobs
jobManager.CancelAll()

// 3. Flush async writers (queues, publishers)
publisher.Close()

// 4. Close connections
db.Close()
cache.Close()

Order matters. Stop accepting work first, then wait for in-flight work to finish, then flush any buffered writes, then close connections. If you close the database before flushing a publisher that writes to it, you lose data.

For apps with long-running background goroutines, use a status flag so other parts of the code can check if shutdown is in progress and stop picking up new work:

var shuttingDown atomic.Bool

func isShuttingDown() bool {
    return shuttingDown.Load()
}

The bookmarks API doesn't need this — srv.Shutdown handles everything. But when you build something with background workers, queues, or scheduled jobs, plan your shutdown order carefully.

Key Takeaways

http.ListenAndServe terminates immediately on signals. That's not safe for production
Use signal.NotifyContext to catch SIGINT and SIGTERM
http.Server.Shutdown stops accepting connections and waits for active requests to finish
Always use a timeout context for shutdown. Don't wait forever for stuck requests
Start the server in a goroutine so main can handle the shutdown flow
http.ErrServerClosed is the expected return from ListenAndServe after Shutdown. It's not an error
Close resources (database, files, queues) after the server has fully stopped