08 - Error Handling in Concurrent Code

📋 Jump to Takeaways

Error handling in sequential code is simple. Check err, return it. In concurrent code, errors happen in goroutines that can't return values to the caller. You need a strategy.

Why This Is Different

In sequential code, errors bubble up through return values. In concurrent code, a goroutine can't return an error to whoever launched it — it's running independently. If you ignore this, errors disappear silently. Your program looks like it succeeded when it didn't.

You need concurrent error handling when:

Multiple goroutines do work and any of them can fail
You want to stop all work when the first error happens
You need to collect errors from all goroutines, not just the first
A goroutine might panic and you need to prevent it from crashing the whole program

Error Channels

The simplest approach: send errors through a channel.

func doWork(id int) error {
    if id == 3 {
        return fmt.Errorf("worker %d failed", id)
    }
    time.Sleep(200 * time.Millisecond) // simulate work
    return nil
}

func main() {
    errCh := make(chan error, 5)
    var wg sync.WaitGroup

    for i := 1; i <= 5; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            if err := doWork(id); err != nil {
                errCh <- err
            }
        }(i)
    }

    wg.Wait()
    close(errCh)

    for err := range errCh {
        fmt.Println("error:", err)
    }
}

Buffer the error channel to match the number of goroutines. Otherwise a goroutine could block trying to send an error if nobody is reading yet.

First Error Wins

Often you want to stop everything on the first error. Combine an error channel with context cancellation.

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    errCh := make(chan error, 1) // buffer 1 — only need the first error
    var wg sync.WaitGroup

    for i := 1; i <= 5; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()

            select {
            case <-ctx.Done():
                return
            default:
            }

            if err := doWork(id); err != nil {
                select {
                case errCh <- err: // send first error
                    cancel() // cancel all other goroutines
                default: // another goroutine already sent an error
                }
            }
        }(i)
    }

    wg.Wait()
    close(errCh)

    if err := <-errCh; err != nil {
        fmt.Println("failed:", err)
    } else {
        fmt.Println("all succeeded")
    }
}

The first goroutine to fail sends its error and cancels the context. Other goroutines check ctx.Done() and exit early.

errgroup

golang.org/x/sync/errgroup wraps the "First Error Wins" pattern into a clean API. It's the standard tool for concurrent error handling in Go.

The manual pattern gives you first-error behavior, but you handle everything else yourself — WaitGroup, context, cleanup. errgroup adds:

	Manual "First Error Wins"	errgroup
First error returned	You build it	Built-in
WaitGroup	Manual `wg.Add`/`wg.Done`	Automatic
Context cancellation	Wire it yourself	`WithContext` does it for you
Concurrency limit	Semaphore or channel	`SetLimit`
Boilerplate	More	Less

Here's how it looks in practice:

go get golang.org/x/sync/errgroup

import "golang.org/x/sync/errgroup"

func main() {
    g, ctx := errgroup.WithContext(context.Background())

    for i := 1; i <= 5; i++ {
        id := i
        g.Go(func() error {
            select {
            case <-ctx.Done():
                return ctx.Err()
            default:
            }

            if id == 3 {
                return fmt.Errorf("worker %d failed", id)
            }
            time.Sleep(200 * time.Millisecond) // simulate work
            fmt.Printf("worker %d done\n", id)
            return nil
        })
    }

    if err := g.Wait(); err != nil {
        fmt.Println("failed:", err)
    } else {
        fmt.Println("all succeeded")
    }
}

errgroup.WithContext creates a group and a derived context. When any goroutine returns an error, the context is cancelled. g.Wait() blocks until all goroutines finish and returns the first error. Note: this auto-cancellation only happens with WithContext. If you create a group with new(errgroup.Group), Wait still returns the first error, but no context is cancelled — all goroutines run to completion.

Important: cancelling the context doesn't kill goroutines — they have to check ctx.Done() themselves. If a goroutine ignores the context, it keeps running until it finishes on its own.

errgroup with Concurrency Limit

errgroup supports limiting concurrent goroutines since Go 1.20.

func main() {
    g, ctx := errgroup.WithContext(context.Background())
    g.SetLimit(3) // max 3 concurrent goroutines

    urls := []string{
        "https://go.dev",
        "https://pkg.go.dev",
        "https://github.com",
        "https://example.com",
        "https://httpbin.org/get",
    }

    for _, url := range urls {
        url := url
        g.Go(func() error {
            req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
            if err != nil {
                return fmt.Errorf("%s: %w", url, err)
            }

            resp, err := http.DefaultClient.Do(req)
            if err != nil {
                return fmt.Errorf("%s: %w", url, err)
            }
            resp.Body.Close()

            fmt.Printf("%s: %d\n", url, resp.StatusCode)
            return nil
        })
    }

    if err := g.Wait(); err != nil {
        fmt.Println("error:", err)
    }
}

SetLimit(3) means at most 3 goroutines run at once. This combines errgroup with semaphore behavior — no need for a separate semaphore.

Notice that http.NewRequestWithContext(ctx, ...) ties each request to the errgroup's context. If one request fails, the context is cancelled and in-flight requests abort immediately. If we used http.Get(url) instead (no context), the other requests would keep running until they finish on their own.

Collecting All Errors

Sometimes you want every error, not just the first one. errgroup only returns the first. For all errors, use a channel or a mutex-protected slice.

func main() {
    var (
        mu   sync.Mutex
        errs []error
        wg   sync.WaitGroup
    )

    for i := 1; i <= 5; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            if err := doWork(id); err != nil {
                mu.Lock()
                errs = append(errs, err)
                mu.Unlock()
            }
        }(i)
    }

    wg.Wait()

    if len(errs) > 0 {
        fmt.Printf("%d errors:\n", len(errs))
        for _, err := range errs {
            fmt.Println(" -", err)
        }
    }
}

Then combine with errors.Join (Go 1.20+):

wg.Wait()

if len(errs) > 0 {
    combined := errors.Join(errs...)
    fmt.Println(combined)

    // the combined error is unwrappable — you can check for specific errors
    if errors.Is(combined, ErrTimeout) {
        // handles any ErrTimeout in the list
    }
}

errors.Join merges the slice into a single error that works with errors.Is() and errors.As(). Note that errors.Join itself is not concurrency-safe — call it after wg.Wait() when all goroutines are done and the slice is fully populated.

Panic Recovery in Goroutines

A panic in a goroutine crashes the entire program. Always recover in goroutines that might panic.

func safeGo(fn func(), wg *sync.WaitGroup) {
    go func() {
        defer wg.Done()
        defer func() {
            if r := recover(); r != nil {
                fmt.Println("recovered from panic:", r)
            }
        }()
        fn()
    }()
}

func main() {
    var wg sync.WaitGroup
    wg.Add(1)
    safeGo(func() {
        panic("something went wrong")
    }, &wg)

    wg.Wait()
    fmt.Println("program still running")
}

In production, log the panic with a stack trace and continue. A safeGo wrapper is a common utility.

Which Approach to Use

Scenario	Tool
Stop on first error	`errgroup.WithContext`
Stop on first error + limit concurrency	`errgroup` with `SetLimit`
Collect all errors	Mutex + slice, or buffered error channel
Fire-and-forget with safety	`safeGo` wrapper with panic recovery
Custom cancellation logic	Error channel + context.WithCancel

For most cases, errgroup is the right answer. It handles the WaitGroup, context cancellation, and first-error collection in one package.

Key Takeaways

Goroutines can't return errors — use channels or errgroup
Buffer error channels to prevent goroutines from blocking
errgroup.WithContext cancels all goroutines on first error
errgroup.SetLimit(n) combines error handling with concurrency limiting
Collect all errors with a mutex-protected slice when you need every failure
Always recover panics in goroutines — an unrecovered panic kills the program
errgroup is the standard tool — use it unless you need custom behavior