Service Mesh
A service mesh is a dedicated infrastructure layer for managing service-to-service communication in a microservices architecture. Instead of each service implementing networking concerns (retries, encryption, load balancing) in application code, a mesh of sidecar proxies handles it transparently.
The Problem
In a microservices system with 50+ services, every service needs:
- Mutual TLS (mTLS) for encrypted communication
- Retries with exponential backoff
- Circuit breaking to avoid cascading failures
- Load balancing across instances
- Observability (latency metrics, distributed traces)
Without a service mesh, each team implements these in their own language and framework. Inconsistent behavior, duplicated effort, and bugs everywhere.
Sidecar Proxy Pattern
A lightweight proxy (typically Envoy) is deployed alongside every service instance. All inbound and outbound traffic flows through the sidecar — the application itself just makes plain HTTP/gRPC calls to localhost.
Service A Service B
┌──────────────────┐ ┌──────────────────┐
│ App (port 8080) │ │ App (port 8080) │
│ ↕ │ │ ↕ │
│ Sidecar Proxy │──── mTLS ───│ Sidecar Proxy │
│ (Envoy :15001) │ │ (Envoy :15001) │
└──────────────────┘ └──────────────────┘The application sends a request to http://service-b:8080. The local sidecar intercepts it, applies policies (retry, timeout, circuit break), encrypts it with mTLS, load-balances across Service B instances, and forwards it. Service B's sidecar decrypts and delivers to the local app.
Control Plane vs Data Plane
Data plane — the sidecar proxies themselves. They handle actual traffic. Every service instance has one.
Control plane — the central brain that configures all sidecars. It pushes routing rules, TLS certificates, retry policies, and traffic-splitting configs to every proxy.
┌─────────────────────────┐
│ Control Plane │
│ (Istiod / Linkerd CP) │
│ - Certificate authority │
│ - Config distribution │
│ - Service discovery │
└────────┬────────┬────────┘
│ config │ config
┌────▼──┐ ┌──▼────┐
│Sidecar│ │Sidecar│ ← Data Plane
│Proxy A│ │Proxy B│
└───────┘ └───────┘What a Service Mesh Handles
| Concern | Without mesh | With mesh |
|---|---|---|
| Encryption | Each service manages TLS certs | Automatic mTLS, certs rotated by control plane |
| Retries | Coded per service, inconsistent | Configured globally, per-route policies |
| Circuit breaking | Library-dependent (Hystrix, etc.) | Proxy-level, language-agnostic |
| Load balancing | Client-side or external LB | Sidecar does L7 balancing with health checks |
| Observability | Manual instrumentation | Automatic latency/error metrics and traces |
| Traffic splitting | Feature flags in code | Route 5% to canary via config |
Common Service Meshes
- Istio — most feature-rich, uses Envoy sidecars, complex to operate
- Linkerd — lightweight, simpler, purpose-built for Kubernetes
- Consul Connect — HashiCorp's mesh, integrates with Consul service discovery
When You Don't Need One
A service mesh adds operational complexity — more containers, more memory, more latency (0.5-1ms per hop with modern proxies like Envoy/Linkerd). Skip it when:
- You have fewer than ~10 services
- A single language/framework handles cross-cutting concerns consistently
- You don't need mTLS between services (e.g., trusted network)
Concepts Used
| Concept | Lesson | How it's used here |
|---|---|---|
| Reverse proxy | 05 | Sidecars are reverse proxies deployed per-service |
| Load balancing | 06 | Sidecar handles L7 load balancing across instances |
| Designing for failure | 21 | Circuit breaking and retries prevent cascading failures |
| Monitoring and observability | 20 | Automatic metrics and distributed tracing |