05 - Proxies and Service Communication

📋 Jump to Takeaways

In distributed systems, services rarely talk directly to each other or to clients without intermediaries. Proxies sit between communicating parties to add security, performance, and reliability. Understanding how services find and communicate with each other is fundamental to designing scalable architectures.

Forward Proxy vs Reverse Proxy

A proxy is just a server that sits between two parties and relays traffic. The difference is which side it represents.

A forward proxy sits in front of clients. The client sends requests to the proxy, which forwards them to the destination server. The server never sees the client's real IP — it only sees the proxy's IP.

Client ──▶ Forward Proxy (hides client) ──▶ Internet ──▶ Server

Use cases for forward proxies:

VPNs — route traffic through another network, mask client location
Corporate proxies — enforce access policies, block certain sites
Caching proxies (Squid) — cache frequently accessed content closer to clients. A company with 1,000 employees all accessing the same resources saves bandwidth by caching at the proxy. Squid is the classic open-source forward caching proxy.

A reverse proxy sits in front of servers. The client sends requests to the proxy, which forwards them to one of many backend servers. The client never sees the backend servers directly — it only knows the proxy's address.

Client ──▶ Reverse Proxy (hides servers) ──▶ Backend Server 1
                                          ──▶ Backend Server 2
                                          ──▶ Backend Server 3

Use cases for reverse proxies:

Nginx, HAProxy — route traffic to healthy backend instances
Cloudflare — DDoS protection, edge caching, WAF
AWS ALB/CloudFront — managed reverse proxy with auto-scaling integration

Reverse Proxy Responsibilities

Responsibility	What It Does
SSL termination¹	Decrypts HTTPS at the proxy so backends handle plain HTTP
Compression	Gzips responses before sending to clients
Caching	Serves repeated responses without hitting backends
Load balancing	Distributes requests across multiple backend instances
WAF (Web Application Firewall)	Inspects requests and blocks SQL injection, XSS, bot traffic before it reaches your app — like a bouncer that reads every request and rejects the malicious ones
Rate limiting	Throttles abusive clients before they reach your app

Every production system uses a reverse proxy. If you're deploying without one, you're handling all of the above in application code — which is slower, harder to maintain, and less secure.

¹ "Termination" here means the encrypted connection ends (is terminated) at that point — not that something is being shut down.

Service-to-Service Communication

Once you have multiple services, they need to talk to each other. Two fundamental patterns:

Synchronous Communication

The caller sends a request and waits for a response. Like a phone call — you ask a question and stay on the line until you get an answer.

HTTP/REST — simple, widely supported, human-readable. Works well for CRUD operations. Every developer knows it. Debugging is easy (just curl it).
gRPC — binary protocol over HTTP/2. Strongly typed via Protocol Buffers. Smaller payloads and efficient serialization compared to JSON/REST. Ideal for internal service-to-service calls where both sides are under your control.

The downside of sync: if the downstream service is slow or down, the caller is stuck waiting. This creates tight coupling and cascading failures.

Asynchronous Communication

The caller sends a message and moves on. No waiting. Like dropping a letter in a mailbox — you don't stand there until it's delivered.

Message queues (SQS, RabbitMQ) — point-to-point. One producer, one consumer. Guarantees delivery. The consumer processes at its own pace.
Event streams (Kafka, Kinesis) — publish-subscribe. Multiple consumers read the same stream independently. Events are retained for replay.

The downside of async: you don't get an immediate result. If you need to show the user a response that depends on the processing, async doesn't work alone.

Choosing Sync vs Async

Scenario	Pattern	Why
Fetching user profile	Sync (REST/gRPC)	Client needs the data immediately
Placing an order	Sync request + async processing	Validate, check inventory, and charge payment synchronously (user needs immediate feedback). Then handle fulfillment, shipping, and notifications asynchronously via queue.
Sending notifications	Async (queue)	User doesn't wait for email/SMS delivery
Propagating state changes	Async (event stream)	Multiple services react independently
Real-time queries/search	Sync (gRPC)	Low-latency response required

Rule of thumb: use sync for reads and queries, async for writes, events, and notifications. The "accept fast, process later" pattern (sync acknowledgment + async work) is one of the most common in production systems — it keeps response times low while handling heavy processing reliably.

Service Discovery

Problem: In a dynamic environment, services scale up and down. IP addresses change. How does Service A find a healthy instance of Service B?

Imagine your order service needs to call the payment service. Yesterday payment ran on 3 instances. Today it auto-scaled to 7. One crashed and was replaced. The IPs are completely different. Hardcoding IPs is impossible — you need a way to ask "where is payment right now?"

A service registry is a database of running instances. Services register themselves on startup and deregister on shutdown. Tools like Consul, etcd, and ZooKeeper serve this role.

Client-Side Discovery

The client (the service making the call — e.g., the order service calling payment) queries the service registry to get a list of available instances, then picks one (round-robin, random, least connections).

Pro: client controls load balancing strategy
Con: every client needs discovery logic; tightly coupled to registry
Developer experience: your code fetches instance list from registry, then calls one directly

Server-Side Discovery

The client sends requests to a load balancer (AWS ALB, Kubernetes Service). The load balancer queries the registry and routes the request to a healthy instance.

Pro: clients are simple — just hit one endpoint
Con: extra network hop; load balancer can become a bottleneck
Developer experience: your code just calls http://payment-service/charge — Kubernetes resolves it to a healthy pod. You never see IPs.

DNS-Based Discovery

Instead of a dedicated registry, services register under a DNS name (e.g., payment.internal). When the caller resolves that name, the DNS server returns IPs of healthy instances. Short TTLs (5–30s) ensure stale entries expire quickly.

Example: Consul can expose services as DNS records. Your code calls payment.service.consul — DNS resolves it to [10.0.1.5, 10.0.1.6, 10.0.1.9] and your HTTP client picks one.

Pro: universal — every language/framework understands DNS, no special SDK needed
Con: DNS caching (at OS, resolver, or library level) can route to dead instances even after TTL expires; no per-request load balancing — the client caches the resolved IP for the TTL duration
Developer experience: same as calling any domain name — resolution happens transparently

Choosing a Discovery Pattern

Pattern	Best for	Example
Client-side	Fine-grained control over routing (custom load balancing, failover logic)	Netflix Eureka (legacy, now in maintenance mode) — a service registry where instances register themselves; the caller gets the full list and picks based on latency/zone
Server-side	Simplicity — callers don't need discovery logic	Kubernetes Services, AWS ALB — just call a stable endpoint
DNS-based	Polyglot environments where services use different languages/frameworks	Consul DNS — no SDK needed, any HTTP client works

Default choice: server-side (Kubernetes or a load balancer). It's the simplest for developers and handles most cases. Use client-side when you need custom routing logic. Use DNS-based when you can't add a library or sidecar to every service.

In practice, Kubernetes uses server-side discovery (ClusterIP Services + kube-proxy), while Consul and Eureka support both client-side and DNS-based patterns.

In system design interviews, mention service discovery when you have multiple instances of a service. It signals you understand that services don't have fixed addresses in production.

Sidecar Pattern and Service Mesh

The Sidecar Pattern

A sidecar is a helper process deployed alongside your service (same pod in Kubernetes, same host). It handles cross-cutting concerns so your application code stays focused on business logic.

Without a sidecar, every service implements its own retry logic, TLS certificates, metrics collection, and circuit breaking. With 50 services, that's 50 implementations of the same concerns — each slightly different, each a potential bug.

A sidecar handles all of this in one place:

Mutual TLS (mTLS) between services — both sides verify each other's certificates, so only authorized services can communicate (unlike regular TLS where only the client verifies the server)
Automatic retries with exponential backoff
Circuit breaking
Distributed tracing and metrics collection
Traffic shaping (canary deployments, A/B routing)

Service Mesh

A service mesh deploys a sidecar proxy (typically Envoy) next to every service. A control plane (Istio, Linkerd) configures all the sidecars centrally.

What the mesh handles:

mTLS everywhere — zero-trust networking without app changes
Traffic routing — shift 5% of traffic to a new version
Circuit breaking — stop calling a failing service
Observability — automatic request tracing, latency histograms, error rates

When You Need It

Large microservice deployments (50+ services)
Implementing retries, TLS, and observability in each service individually is impractical
You need consistent security policies across all services

When You Don't

Small teams with fewer than 10 services — the operational overhead isn't worth it
Monolith or modular monolith — no inter-service network calls to manage
Team lacks Kubernetes expertise — a mesh adds significant complexity

Key Takeaways

Reverse proxies are non-negotiable in production — they handle SSL, caching, rate limiting, and load balancing outside your application code.
Use sync communication (REST/gRPC) when the caller needs an immediate response; use async (queues/streams) when work can be deferred or multiple consumers need the same event.
Service discovery solves the problem of finding healthy instances in dynamic environments — server-side discovery (load balancer) is the simplest starting point.
gRPC outperforms REST for internal service-to-service calls due to binary encoding, HTTP/2 multiplexing, and strong typing.
A service mesh is powerful but expensive in complexity — adopt it when you have enough services that implementing cross-cutting concerns individually becomes unsustainable.
Start simple: reverse proxy → direct HTTP calls → add a message queue for async work → adopt service discovery as you scale → consider a mesh only when the pain justifies it.