Service Mesh

A service mesh is a dedicated infrastructure layer for managing service-to-service communication in a microservices architecture. Instead of each service implementing networking concerns (retries, encryption, load balancing) in application code, a mesh of sidecar proxies handles it transparently.

The Problem

In a microservices system with 50+ services, every service needs:

Mutual TLS (mTLS) for encrypted communication
Retries with exponential backoff
Circuit breaking to avoid cascading failures
Load balancing across instances
Observability (latency metrics, distributed traces)

Without a service mesh, each team implements these in their own language and framework. Inconsistent behavior, duplicated effort, and bugs everywhere.

Sidecar Proxy Pattern

A lightweight proxy (typically Envoy) is deployed alongside every service instance. All inbound and outbound traffic flows through the sidecar — the application itself just makes plain HTTP/gRPC calls to localhost.

Service A                          Service B
┌──────────────────┐              ┌──────────────────┐
│  App (port 8080) │              │  App (port 8080) │
│        ↕         │              │        ↕         │
│  Sidecar Proxy   │──── mTLS ────│  Sidecar Proxy   │
│  (Envoy :15001)  │              │  (Envoy :15001)  │
└──────────────────┘              └──────────────────┘

The application sends a request to http://service-b:8080. The local sidecar intercepts it, applies policies (retry, timeout, circuit break), encrypts it with mTLS, load-balances across Service B instances, and forwards it. Service B's sidecar decrypts and delivers to the local app.

Control Plane vs Data Plane

Data plane — the sidecar proxies themselves. They handle actual traffic. Every service instance has one.

Control plane — the central brain that configures all sidecars. It pushes routing rules, TLS certificates, retry policies, and traffic-splitting configs to every proxy.

┌──────────────────────────┐
│      Control Plane       │
│  (Istiod / Linkerd CP)   │
│  - Certificate authority │
│  - Config distribution   │
│  - Service discovery     │
└────────┬────────┬────────┘
         │ config │ config
    ┌────▼──┐ ┌───▼───┐
    │Sidecar│ │Sidecar│  ← Data Plane
    │Proxy A│ │Proxy B│
    └───────┘ └───────┘

What a Service Mesh Handles

Concern	Without mesh	With mesh
Encryption	Each service manages TLS certs	Automatic mTLS, certs rotated by control plane
Retries	Coded per service, inconsistent	Configured globally, per-route policies
Circuit breaking	Library-dependent (Hystrix, etc.)	Proxy-level, language-agnostic
Load balancing	Client-side or external LB	Sidecar does L7 balancing with health checks
Observability	Manual instrumentation	Automatic latency/error metrics and traces
Traffic splitting	Feature flags in code	Route 5% to canary via config

Common Service Meshes

Istio — most feature-rich, uses Envoy sidecars, complex to operate
Linkerd — lightweight, simpler, purpose-built for Kubernetes
Consul Connect — HashiCorp's mesh, integrates with Consul service discovery

When You Don't Need One

A service mesh adds operational complexity — more containers, more memory, more latency (0.5-1ms per hop with modern proxies like Envoy/Linkerd). Skip it when:

You have fewer than ~10 services
A single language/framework handles cross-cutting concerns consistently
You don't need mTLS between services (e.g., trusted network)

Concepts Used

Concept	Lesson	How it's used here
Reverse proxy	05	Sidecars are reverse proxies deployed per-service
Load balancing	06	Sidecar handles L7 load balancing across instances
Designing for failure	21	Circuit breaking and retries prevent cascading failures
Monitoring and observability	20	Automatic metrics and distributed tracing