14 - Profiling & Benchmarks
📋 Jump to TakeawaysYou think the JSON marshaling is slow. Your colleague thinks it's the database queries. Neither of you knows. That's the problem. Performance intuition is wrong more often than it's right. The rule: profile first, don't guess.
Go ships with profiling and benchmarking tools in the standard library. No external APM (Application Performance Monitoring) needed for the basics.
Benchmarks with testing.B
Benchmark functions live in _test.go files, just like tests. They start with Benchmark and take *testing.B:
// store_test.go
func BenchmarkBookmarkStore_List(b *testing.B) {
db := setupTestDB(b) // setupTestDB accepts testing.TB, so it works with both *testing.T and *testing.B
store := NewBookmarkStore(db)
// Seed some data
ctx := context.Background()
for i := 0; i < 100; i++ {
store.Create(ctx, fmt.Sprintf("https://example.com/%d", i), fmt.Sprintf("Bookmark %d", i))
}
b.ResetTimer() // Don't count setup time
for b.Loop() {
store.List(ctx)
}
}b.Loop() is the idiomatic way to write benchmark loops since Go 1.24. The framework decides how many iterations to run to get a stable measurement. b.ResetTimer() excludes setup from the timing.
If you're on Go 1.23 or earlier, use
for i := 0; i < b.N; i++instead ofb.Loop().
Run it:
go test -bench=BenchmarkBookmarkStore_List -benchmem ./apiOutput:
BenchmarkBookmarkStore_List-8 5000 230145 ns/op 4096 B/op 52 allocs/opThat's 5000 iterations, ~230 microseconds per call, 4KB allocated, 52 allocations. Now you have numbers, not guesses.
Benchmarking a Handler
Test the full HTTP path including JSON serialization:
func BenchmarkHandleListBookmarks(b *testing.B) {
db := setupTestDB(b)
store := NewBookmarkStore(db)
ctx := context.Background()
for i := 0; i < 50; i++ {
store.Create(ctx, fmt.Sprintf("https://example.com/%d", i), fmt.Sprintf("Bookmark %d", i))
}
mux := http.NewServeMux()
registerRoutes(mux, store)
req := httptest.NewRequest("GET", "/api/bookmarks", nil)
b.ResetTimer()
for b.Loop() {
rec := httptest.NewRecorder()
mux.ServeHTTP(rec, req)
}
}This benchmarks everything: routing, handler logic, database query, JSON encoding, response writing. If it's slow, you can then benchmark individual pieces to find the bottleneck.
Useful Flags
# Run all benchmarks
go test -bench=. ./api
# Run benchmarks matching a pattern
go test -bench=Store ./api
# Include memory stats
go test -bench=. -benchmem ./api
# Run for longer (more stable results)
go test -bench=. -benchtime=5s ./api
# Compare before/after with count
go test -bench=. -count=6 ./api > old.txt
# ... make changes ...
go test -bench=. -count=6 ./api > new.txtFor comparing benchmark results, install benchstat:
go install golang.org/x/perf/cmd/benchstat@latest
benchstat old.txt new.txtIt gives you a statistical comparison with confidence intervals. Much better than eyeballing numbers.
Runtime Profiling with pprof
Benchmarks measure specific functions. pprof profiles the running application. It answers: where is my program spending its time right now?
Add the pprof handler to your server:
import _ "net/http/pprof"
func main() {
// ... your routes ...
// pprof registers on DefaultServeMux automatically.
// Run it on a separate port so it's not exposed publicly.
go func() {
slog.Info("pprof listening", "addr", ":6060")
http.ListenAndServe(":6060", nil)
}()
// ... start main server ...
}The blank import registers handlers on http.DefaultServeMux. We run it on a separate port (6060) so it's never exposed to the internet. In production, only expose this internally or behind authentication.
Collecting Profiles
With the server running, grab a CPU profile:
# 30-second CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30This starts a 30-second sampling session. While it runs, hit your API with some traffic. Then pprof drops you into an interactive shell:
(pprof) top10
Showing nodes accounting for 2.5s, 83% of 3s total
flat flat% sum% cum cum%
0.8s 26.67% 26.67% 0.8s 26.67% runtime.cgocall
0.5s 16.67% 43.33% 0.5s 16.67% encoding/json.(*encodeState).marshal
...flat is time spent in the function itself. cum (cumulative) includes time in functions it calls. If cum is high but flat is low, the function is slow because of what it calls, not what it does.
Memory Profiles
# Heap profile
go tool pprof http://localhost:6060/debug/pprof/heap
# Allocations profile (total allocations, not just live)
go tool pprof -alloc_space http://localhost:6060/debug/pprof/heapIn the pprof shell:
(pprof) top10
(pprof) list handleListBookmarks # source-level annotation
(pprof) web # open graph in browser (needs graphviz)If you prefer a visual UI instead of the interactive shell, use:
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heapThis opens a browser with flame graphs, call graphs, and source annotations. Easier than memorizing pprof commands.
Goroutine and Block Profiles
# How many goroutines and what they're doing
go tool pprof http://localhost:6060/debug/pprof/goroutine
# Where goroutines block on synchronization
go tool pprof http://localhost:6060/debug/pprof/blockThe goroutine profile is great for finding leaks. If the count keeps growing, something is spawning goroutines without cleaning them up.
The Workflow
- Notice something is slow (or get a report)
- Write a benchmark for the specific operation
- Run with
-benchmemto see allocations - If you need more detail, use pprof on the running server
- Find the hot spot
- Fix it
- Run the benchmark again to confirm the improvement
Don't optimize without measuring. Don't measure without a reason. Most code doesn't need to be fast. The 5% that does will show up in profiles.
Applying to Our Project
Add benchmarks to store_test.go and handler_test.go:
// store_test.go
func BenchmarkBookmarkStore_Create(b *testing.B) {
db := setupTestDB(b)
store := NewBookmarkStore(db)
ctx := context.Background()
for b.Loop() {
store.Create(ctx, "https://go.dev", "Go")
}
}Add pprof to main.go on a debug port:
import _ "net/http/pprof"
// In main(), before the main server starts:
go func() {
slog.Info("pprof listening", "addr", ":6060")
http.ListenAndServe(":6060", nil)
}()Run benchmarks as part of your development cycle:
go test -bench=. -benchmem ./apiKey Takeaways
- Profile first, don't guess. Performance intuition is unreliable
- Benchmark functions start with
Benchmark, take*testing.B, and useb.Loop() b.ResetTimer()excludes setup from measurements.-benchmemshows allocationsnet/http/pprofprofiles the running application: CPU, memory, goroutines, blocking- Run pprof on a separate port. Never expose it publicly
go tool pprofgives you an interactive shell withtop,list, andwebcommands- Use
benchstatto compare before/after results with statistical confidence - The workflow: measure, find the hot spot, fix it, measure again