18 - Real-Time Communication
📋 Jump to TakeawaysThe Problem
HTTP is request-response. The client asks, the server answers. But what if the server needs to push data to the client without being asked? Live chat messages, stock tickers, notifications, streaming AI responses.
You need a way to keep the connection alive and push data as it becomes available.
Polling
The simplest approach. The client repeatedly asks "anything new?" on a timer.
Client: GET /messages?since=123 → Server: []
Client: GET /messages?since=123 → Server: []
Client: GET /messages?since=123 → Server: [{id: 124, text: "hey"}]Pros: simple to implement, works everywhere, stateless server.
Cons: wasteful. Most requests return nothing. You're burning bandwidth and server resources for empty responses. And there's always a delay (up to the polling interval) before new data appears.
Acceptable for low-frequency updates (check for new emails every 30 seconds). Terrible for real-time chat.
Long Polling
An improvement over polling. The client sends a request, and the server holds it open until there's new data (or a timeout).
Client: GET /messages?since=123 → (server holds connection open)
→ (30 seconds later, new message arrives)
→ Server: [{id: 124, text: "hey"}]
Client: GET /messages?since=124 → (holds again...)Pros: near-instant delivery, less wasteful than polling, works through firewalls and proxies.
Cons: each held connection consumes server resources. Reconnection overhead after each response. Not truly real-time for high-frequency updates.
Used by: early Facebook chat, some notification systems.
Server-Sent Events (SSE)
A one-way stream from server to client over HTTP/HTTPS. The client opens a connection, and the server pushes events as they happen.
GET /stream HTTP/1.1
HTTP/1.1 200 OK
Content-Type: text/event-stream
data: {"message": "hello"}
data: {"message": "world"}
data: [DONE]Pros: simple API (just HTTP), automatic reconnection built into the browser, works through CDNs and proxies, lightweight.
Cons: one-way only (server to client). Limited to text data. Some proxies buffer responses (breaking the stream).
Used by: ChatGPT's streaming responses, live dashboards, event feeds.
SSE is the right choice when you only need server-to-client push. It's simpler than WebSockets and works with existing HTTP infrastructure.
WebSockets
A full-duplex, bidirectional connection. Both client and server can send messages at any time after the initial handshake.
Client: GET /chat (Upgrade: websocket)
Server: 101 Switching Protocols
Client ←→ Server (bidirectional messages)Pros: true bidirectional communication, low overhead per message (no HTTP headers), supports binary data.
Cons: more complex to implement and scale. Requires sticky sessions or a pub/sub layer. Doesn't work through some corporate proxies. Connection state must be managed.
Used by: chat applications, multiplayer games, collaborative editing (Google Docs), live trading platforms.
Scaling WebSockets
WebSockets are stateful. Each connection is tied to a specific server. This creates challenges:
Connection management — a single server can hold tens of thousands of connections, but each consumes memory. You need to track which user is connected to which server.
Message routing — if User A is on Server 1 and User B is on Server 2, how does A's message reach B? You need a pub/sub layer (Redis Pub/Sub, Kafka) between servers.
Load balancing — you can't use simple round-robin. You need sticky sessions or a connection registry.
Sticky sessions: the load balancer always routes the same user to the same server (via cookie or IP hash). Simple, but if that server dies, the user's connection is lost. You also can't freely scale down without disconnecting users.
Connection registry: a lookup table in Redis that maps each user to the server holding their connection. Any server can look up where a user is connected and route messages there. More flexible — servers can be added or removed without the load balancer needing special logic.
Reconnection — clients disconnect constantly (network switches, sleep mode). The server must handle reconnection gracefully and replay missed messages.
WebRTC
WebRTC (Web Real-Time Communication) enables peer-to-peer communication directly between browsers — no server relaying data in the middle.
Data channels — full-duplex bidirectional messaging between peers. Low latency because data doesn't round-trip through a server. Used for file sharing, peer-to-peer chat, and multiplayer gaming.
Media streams — audio and video between peers using WebRTC's media APIs. For 1:1 calls, media can flow directly peer-to-peer. For group calls, services like Google Meet, Discord, and Zoom use an SFU (Selective Forwarding Unit) — a server that receives each participant's stream and forwards it to others, avoiding the N×N mesh that pure P2P would require.
The catch: peers need help finding each other initially. A signaling server (your server, using WebSockets or HTTP) exchanges connection metadata (SDP¹ offers/answers and ICE² candidates). Once the connection is established, the server is out of the loop.
¹ SDP (Session Description Protocol) — a text format describing what each peer supports: codecs, media types, bandwidth, and network addresses. Peer A creates an "offer" (here's what I can do), Peer B responds with an "answer" (here's what I'll accept). They agree on a common format before any media flows.
² ICE (Interactive Connectivity Establishment) — a framework for finding the best network path between peers. Each peer gathers "candidates" (possible IP:port combinations — local address, public address via STUN, relay via TURN) and they test connectivity on each until one works.
When NATs/firewalls block direct connections: a TURN server relays traffic as a fallback. This adds latency (traffic goes through a server again) but ensures connectivity.
| WebSockets | WebRTC | |
|---|---|---|
| Topology | Client ↔ Server | Peer ↔ Peer |
| Latency | Low (one hop to server) | Lowest (direct, no server) |
| Setup | Simple (HTTP upgrade) | Complex (signaling + ICE) |
| Scaling | Server handles all connections | Peers handle their own connections |
| Use case | Chat rooms, live feeds, collaboration | Video calls, P2P file transfer, low-latency gaming |
Use WebSockets when you need a central server (chat rooms with history, broadcasting to many users). Use WebRTC when you need the lowest possible latency between two peers and can handle the setup complexity.
Choosing the Right Approach
| Use case | Best fit |
|---|---|
| Infrequent updates (email check) | Polling |
| Notifications, feeds | Long polling or SSE |
| Streaming responses (AI, live data) | SSE |
| Chat, gaming, collaboration | WebSockets |
| IoT sensor data (bidirectional) | WebSockets |
| Video/voice calls | WebRTC |
| P2P file transfer, low-latency gaming | WebRTC |
The decision tree:
- Do you need peer-to-peer with lowest latency? → WebRTC
- Do you need bidirectional communication via a server? → WebSocket
- Is it server-to-client only? → SSE
- Can you tolerate seconds of delay? → Long polling
- Is it very low frequency? → Polling
Key Takeaways
- Polling is simple but wasteful; long polling reduces waste but still has overhead
- SSE is ideal for one-way server-to-client streaming (like ChatGPT responses)
- WebSockets enable bidirectional real-time communication but are harder to scale
- Scaling WebSockets requires a pub/sub layer for message routing between servers
- WebRTC provides the lowest latency via peer-to-peer, but setup is complex (signaling, ICE, TURN fallback)
- Choose based on directionality and frequency of updates