05 - Proxies and Service Communication

📋 Jump to Takeaways

In distributed systems, services rarely talk directly to each other or to clients without intermediaries. Proxies sit between communicating parties to add security, performance, and reliability. Understanding how services find and communicate with each other is fundamental to designing scalable architectures.


Forward Proxy vs Reverse Proxy

A proxy is just a server that sits between two parties and relays traffic. The difference is which side it represents.

A forward proxy sits in front of clients. The client sends requests to the proxy, which forwards them to the destination server. The server never sees the client's real IP — it only sees the proxy's IP.

Client ──▶ Forward Proxy (hides client) ──▶ Internet ──▶ Server

Use cases for forward proxies:

  • VPNs — route traffic through another network, mask client location
  • Corporate proxies — enforce access policies, block certain sites
  • Caching proxies (Squid) — cache frequently accessed content closer to clients. A company with 1,000 employees all accessing the same resources saves bandwidth by caching at the proxy. Squid is the classic open-source forward caching proxy.

A reverse proxy sits in front of servers. The client sends requests to the proxy, which forwards them to one of many backend servers. The client never sees the backend servers directly — it only knows the proxy's address.

Client ──▶ Reverse Proxy (hides servers) ──▶ Backend Server 1
                                          ──▶ Backend Server 2
                                          ──▶ Backend Server 3

Use cases for reverse proxies:

  • Nginx, HAProxy — route traffic to healthy backend instances
  • Cloudflare — DDoS protection, edge caching, WAF
  • AWS ALB/CloudFront — managed reverse proxy with auto-scaling integration

Reverse Proxy Responsibilities

Responsibility What It Does
SSL termination¹ Decrypts HTTPS at the proxy so backends handle plain HTTP
Compression Gzips responses before sending to clients
Caching Serves repeated responses without hitting backends
Load balancing Distributes requests across multiple backend instances
WAF (Web Application Firewall) Inspects requests and blocks SQL injection, XSS, bot traffic before it reaches your app — like a bouncer that reads every request and rejects the malicious ones
Rate limiting Throttles abusive clients before they reach your app

Every production system uses a reverse proxy. If you're deploying without one, you're handling all of the above in application code — which is slower, harder to maintain, and less secure.

¹ "Termination" here means the encrypted connection ends (is terminated) at that point — not that something is being shut down.


Service-to-Service Communication

Once you have multiple services, they need to talk to each other. Two fundamental patterns:

Synchronous Communication

The caller sends a request and waits for a response. Like a phone call — you ask a question and stay on the line until you get an answer.

  • HTTP/REST — simple, widely supported, human-readable. Works well for CRUD operations. Every developer knows it. Debugging is easy (just curl it).
  • gRPC — binary protocol over HTTP/2. Strongly typed via Protocol Buffers. Smaller payloads and efficient serialization compared to JSON/REST. Ideal for internal service-to-service calls where both sides are under your control.

The downside of sync: if the downstream service is slow or down, the caller is stuck waiting. This creates tight coupling and cascading failures.

Asynchronous Communication

The caller sends a message and moves on. No waiting. Like dropping a letter in a mailbox — you don't stand there until it's delivered.

  • Message queues (SQS, RabbitMQ) — point-to-point. One producer, one consumer. Guarantees delivery. The consumer processes at its own pace.
  • Event streams (Kafka, Kinesis) — publish-subscribe. Multiple consumers read the same stream independently. Events are retained for replay.

The downside of async: you don't get an immediate result. If you need to show the user a response that depends on the processing, async doesn't work alone.

Choosing Sync vs Async

Scenario Pattern Why
Fetching user profile Sync (REST/gRPC) Client needs the data immediately
Placing an order Sync request + async processing Validate, check inventory, and charge payment synchronously (user needs immediate feedback). Then handle fulfillment, shipping, and notifications asynchronously via queue.
Sending notifications Async (queue) User doesn't wait for email/SMS delivery
Propagating state changes Async (event stream) Multiple services react independently
Real-time queries/search Sync (gRPC) Low-latency response required

Rule of thumb: use sync for reads and queries, async for writes, events, and notifications. The "accept fast, process later" pattern (sync acknowledgment + async work) is one of the most common in production systems — it keeps response times low while handling heavy processing reliably.


Service Discovery

Problem: In a dynamic environment, services scale up and down. IP addresses change. How does Service A find a healthy instance of Service B?

Imagine your order service needs to call the payment service. Yesterday payment ran on 3 instances. Today it auto-scaled to 7. One crashed and was replaced. The IPs are completely different. Hardcoding IPs is impossible — you need a way to ask "where is payment right now?"

A service registry is a database of running instances. Services register themselves on startup and deregister on shutdown. Tools like Consul, etcd, and ZooKeeper serve this role.

Client-Side Discovery

The client (the service making the call — e.g., the order service calling payment) queries the service registry to get a list of available instances, then picks one (round-robin, random, least connections).

  • Pro: client controls load balancing strategy
  • Con: every client needs discovery logic; tightly coupled to registry
  • Developer experience: your code fetches instance list from registry, then calls one directly

Server-Side Discovery

The client sends requests to a load balancer (AWS ALB, Kubernetes Service). The load balancer queries the registry and routes the request to a healthy instance.

  • Pro: clients are simple — just hit one endpoint
  • Con: extra network hop; load balancer can become a bottleneck
  • Developer experience: your code just calls http://payment-service/charge — Kubernetes resolves it to a healthy pod. You never see IPs.

DNS-Based Discovery

Instead of a dedicated registry, services register under a DNS name (e.g., payment.internal). When the caller resolves that name, the DNS server returns IPs of healthy instances. Short TTLs (5–30s) ensure stale entries expire quickly.

Example: Consul can expose services as DNS records. Your code calls payment.service.consul — DNS resolves it to [10.0.1.5, 10.0.1.6, 10.0.1.9] and your HTTP client picks one.

  • Pro: universal — every language/framework understands DNS, no special SDK needed
  • Con: DNS caching (at OS, resolver, or library level) can route to dead instances even after TTL expires; no per-request load balancing — the client caches the resolved IP for the TTL duration
  • Developer experience: same as calling any domain name — resolution happens transparently

Choosing a Discovery Pattern

Pattern Best for Example
Client-side Fine-grained control over routing (custom load balancing, failover logic) Netflix Eureka (legacy, now in maintenance mode) — a service registry where instances register themselves; the caller gets the full list and picks based on latency/zone
Server-side Simplicity — callers don't need discovery logic Kubernetes Services, AWS ALB — just call a stable endpoint
DNS-based Polyglot environments where services use different languages/frameworks Consul DNS — no SDK needed, any HTTP client works

Default choice: server-side (Kubernetes or a load balancer). It's the simplest for developers and handles most cases. Use client-side when you need custom routing logic. Use DNS-based when you can't add a library or sidecar to every service.

In practice, Kubernetes uses server-side discovery (ClusterIP Services + kube-proxy), while Consul and Eureka support both client-side and DNS-based patterns.

In system design interviews, mention service discovery when you have multiple instances of a service. It signals you understand that services don't have fixed addresses in production.


Sidecar Pattern and Service Mesh

The Sidecar Pattern

A sidecar is a helper process deployed alongside your service (same pod in Kubernetes, same host). It handles cross-cutting concerns so your application code stays focused on business logic.

Without a sidecar, every service implements its own retry logic, TLS certificates, metrics collection, and circuit breaking. With 50 services, that's 50 implementations of the same concerns — each slightly different, each a potential bug.

A sidecar handles all of this in one place:

  • Mutual TLS (mTLS) between services — both sides verify each other's certificates, so only authorized services can communicate (unlike regular TLS where only the client verifies the server)
  • Automatic retries with exponential backoff
  • Circuit breaking
  • Distributed tracing and metrics collection
  • Traffic shaping (canary deployments, A/B routing)

Service Mesh

A service mesh deploys a sidecar proxy (typically Envoy) next to every service. A control plane (Istio, Linkerd) configures all the sidecars centrally.

What the mesh handles:

  • mTLS everywhere — zero-trust networking without app changes
  • Traffic routing — shift 5% of traffic to a new version
  • Circuit breaking — stop calling a failing service
  • Observability — automatic request tracing, latency histograms, error rates

When You Need It

  • Large microservice deployments (50+ services)
  • Implementing retries, TLS, and observability in each service individually is impractical
  • You need consistent security policies across all services

When You Don't

  • Small teams with fewer than 10 services — the operational overhead isn't worth it
  • Monolith or modular monolith — no inter-service network calls to manage
  • Team lacks Kubernetes expertise — a mesh adds significant complexity

Key Takeaways

  • Reverse proxies are non-negotiable in production — they handle SSL, caching, rate limiting, and load balancing outside your application code.
  • Use sync communication (REST/gRPC) when the caller needs an immediate response; use async (queues/streams) when work can be deferred or multiple consumers need the same event.
  • Service discovery solves the problem of finding healthy instances in dynamic environments — server-side discovery (load balancer) is the simplest starting point.
  • gRPC outperforms REST for internal service-to-service calls due to binary encoding, HTTP/2 multiplexing, and strong typing.
  • A service mesh is powerful but expensive in complexity — adopt it when you have enough services that implementing cross-cutting concerns individually becomes unsustainable.
  • Start simple: reverse proxy → direct HTTP calls → add a message queue for async work → adopt service discovery as you scale → consider a mesh only when the pain justifies it.

📖 Examples

Complete examples for this lesson.

📝 Ready to test your knowledge?

Answer the quiz below to mark this lesson complete.

Spot something off? Report an issue

© 2026 ByteLearn.dev. Free courses for developers. · Privacy