18 - Real-Time Communication

The Problem

HTTP is request-response. The client asks, the server answers. But what if the server needs to push data to the client without being asked? Live chat messages, stock tickers, notifications, streaming AI responses.

You need a way to keep the connection alive and push data as it becomes available.

Polling

The simplest approach. The client repeatedly asks "anything new?" on a timer.

Client: GET /messages?since=123  → Server: []
Client: GET /messages?since=123  → Server: []
Client: GET /messages?since=123  → Server: [{id: 124, text: "hey"}]

Pros: simple to implement, works everywhere, stateless server.

Cons: wasteful. Most requests return nothing. You're burning bandwidth and server resources for empty responses. And there's always a delay (up to the polling interval) before new data appears.

Acceptable for low-frequency updates (check for new emails every 30 seconds). Terrible for real-time chat.

Long Polling

An improvement over polling. The client sends a request, and the server holds it open until there's new data (or a timeout).

Client: GET /messages?since=123  → (server holds connection open)
                                  → (30 seconds later, new message arrives)
                                  → Server: [{id: 124, text: "hey"}]
Client: GET /messages?since=124  → (holds again...)

Pros: near-instant delivery, less wasteful than polling, works through firewalls and proxies.

Cons: each held connection consumes server resources. Reconnection overhead after each response. Not truly real-time for high-frequency updates.

Used by: early Facebook chat, some notification systems.

Server-Sent Events (SSE)

A one-way stream from server to client over HTTP/HTTPS. The client opens a connection, and the server pushes events as they happen.

GET /stream HTTP/1.1

HTTP/1.1 200 OK
Content-Type: text/event-stream

data: {"message": "hello"}

data: {"message": "world"}

data: [DONE]

Pros: simple API (just HTTP), automatic reconnection built into the browser, works through CDNs and proxies, lightweight.

Cons: one-way only (server to client). Limited to text data. Some proxies buffer responses (breaking the stream).

Used by: ChatGPT's streaming responses, live dashboards, event feeds.

SSE is the right choice when you only need server-to-client push. It's simpler than WebSockets and works with existing HTTP infrastructure.

WebSockets

A full-duplex, bidirectional connection. Both client and server can send messages at any time after the initial handshake.

Client: GET /chat (Upgrade: websocket)
Server: 101 Switching Protocols

Client ←→ Server (bidirectional messages)

Pros: true bidirectional communication, low overhead per message (no HTTP headers), supports binary data.

Cons: more complex to implement and scale. Requires sticky sessions or a pub/sub layer. Doesn't work through some corporate proxies. Connection state must be managed.

Used by: chat applications, multiplayer games, collaborative editing (Google Docs), live trading platforms.

Scaling WebSockets

WebSockets are stateful. Each connection is tied to a specific server. This creates challenges:

Connection management — a single server can hold tens of thousands of connections, but each consumes memory. You need to track which user is connected to which server.

Message routing — if User A is on Server 1 and User B is on Server 2, how does A's message reach B? You need a pub/sub layer (Redis Pub/Sub, Kafka) between servers.

Load balancing — you can't use simple round-robin. You need sticky sessions or a connection registry.

Sticky sessions: the load balancer always routes the same user to the same server (via cookie or IP hash). Simple, but if that server dies, the user's connection is lost. You also can't freely scale down without disconnecting users.

Connection registry: a lookup table in Redis that maps each user to the server holding their connection. Any server can look up where a user is connected and route messages there. More flexible — servers can be added or removed without the load balancer needing special logic.

Reconnection — clients disconnect constantly (network switches, sleep mode). The server must handle reconnection gracefully and replay missed messages.

WebRTC

WebRTC (Web Real-Time Communication) enables peer-to-peer communication directly between browsers — no server relaying data in the middle.

Data channels — full-duplex bidirectional messaging between peers. Low latency because data doesn't round-trip through a server. Used for file sharing, peer-to-peer chat, and multiplayer gaming.

Media streams — audio and video between peers using WebRTC's media APIs. For 1:1 calls, media can flow directly peer-to-peer. For group calls, services like Google Meet, Discord, and Zoom use an SFU (Selective Forwarding Unit) — a server that receives each participant's stream and forwards it to others, avoiding the N×N mesh that pure P2P would require.

The catch: peers need help finding each other initially. A signaling server (your server, using WebSockets or HTTP) exchanges connection metadata (SDP¹ offers/answers and ICE² candidates). Once the connection is established, the server is out of the loop.

¹ SDP (Session Description Protocol) — a text format describing what each peer supports: codecs, media types, bandwidth, and network addresses. Peer A creates an "offer" (here's what I can do), Peer B responds with an "answer" (here's what I'll accept). They agree on a common format before any media flows.

² ICE (Interactive Connectivity Establishment) — a framework for finding the best network path between peers. Each peer gathers "candidates" (possible IP:port combinations — local address, public address via STUN, relay via TURN) and they test connectivity on each until one works.

When NATs/firewalls block direct connections: a TURN server relays traffic as a fallback. This adds latency (traffic goes through a server again) but ensures connectivity.

	WebSockets	WebRTC
Topology	Client ↔ Server	Peer ↔ Peer
Latency	Low (one hop to server)	Lowest (direct, no server)
Setup	Simple (HTTP upgrade)	Complex (signaling + ICE)
Scaling	Server handles all connections	Peers handle their own connections
Use case	Chat rooms, live feeds, collaboration	Video calls, P2P file transfer, low-latency gaming

Use WebSockets when you need a central server (chat rooms with history, broadcasting to many users). Use WebRTC when you need the lowest possible latency between two peers and can handle the setup complexity.

Choosing the Right Approach

Use case	Best fit
Infrequent updates (email check)	Polling
Notifications, feeds	Long polling or SSE
Streaming responses (AI, live data)	SSE
Chat, gaming, collaboration	WebSockets
IoT sensor data (bidirectional)	WebSockets
Video/voice calls	WebRTC
P2P file transfer, low-latency gaming	WebRTC

The decision tree:

Do you need peer-to-peer with lowest latency? → WebRTC
Do you need bidirectional communication via a server? → WebSocket
Is it server-to-client only? → SSE
Can you tolerate seconds of delay? → Long polling
Is it very low frequency? → Polling

Key Takeaways

Polling is simple but wasteful; long polling reduces waste but still has overhead
SSE is ideal for one-way server-to-client streaming (like ChatGPT responses)
WebSockets enable bidirectional real-time communication but are harder to scale
Scaling WebSockets requires a pub/sub layer for message routing between servers
WebRTC provides the lowest latency via peer-to-peer, but setup is complex (signaling, ICE, TURN fallback)
Choose based on directionality and frequency of updates