Online Order Flow
This example follows the system design process from lesson 01 to design an e-commerce order system end to end.
Step 1: Requirements
Functional:
- User places an order with items from their cart
- System validates inventory, charges payment, confirms the order
- Send confirmation email, generate invoice, notify warehouse
- Track order status
Non-functional:
- Order confirmation in under 500ms (user is waiting)
- Never oversell (strong consistency on inventory)
- Never lose an order after payment succeeds
- Email/invoice/warehouse can be delayed but must eventually complete
Key constraint: the user only waits for payment + inventory. Everything else happens after.
Step 2: Estimation
Orders per day: 100,000
Peak orders per second: 100K / 86,400 × 3 ≈ 3.5/sec
Average order size: 2 KB (items, address, payment ref)
Daily storage: 100K × 2 KB = 200 MB
Async events per order: 4 (email, invoice, warehouse, analytics)
Queue throughput: ~14 messages/sec peakThis is modest. A single database and a single queue handle this easily. The complexity is in correctness (inventory, payment), not scale.
Step 3: High-Level Design
User
↓
Load Balancer
↓
Order Service
│
┌────┴────┐
▼ ▼
Database Cache
│
▼
Queue
│
┌───┬┴──┬───┐
▼ ▼ ▼ ▼
Email Invoice Warehouse AnalyticsTwo phases:
- Synchronous (user waits): validate → check inventory → reserve → charge → confirm
- Asynchronous (user doesn't wait): publish event → workers process independently
Why a queue between the phases? Because delivery services (email, warehouse API) are slow and unreliable. The queue decouples "order confirmed" from "email sent." If SendGrid is down, the message retries later. The user already has their confirmation.
Step 4: Deep Dive
The Synchronous Path (200-500ms)
Validate the cart — do these products exist? Are prices correct? Is the user authenticated?
Check and reserve inventory — this must be a database transaction with strong consistency (lesson 13). Two users buying the last item at the same time: one succeeds, one gets "out of stock." Without strong consistency, both could succeed and you'd oversell.
Charge payment — call Stripe/payment provider. If it fails, release the reserved inventory and return an error. If it succeeds but the next step fails, you need to refund (this is why payment providers support idempotency keys).
Create order record — write the confirmed order to the database.
Publish event — drop
order.placedonto the queue.Respond — return "Order confirmed" with order ID.
The Asynchronous Path (seconds to minutes)
Each worker consumes the order.placed event independently:
- Email Worker → sends confirmation via SendGrid. Retries on failure.
- Invoice Worker → generates PDF, stores in S3. Not urgent.
- Warehouse Worker → calls fulfillment API. May be slow or unreliable.
- Analytics Worker → records the sale for dashboards.
If any worker fails, the message stays in the queue and retries. Other workers aren't affected. This is the power of decoupling.
Why This Split Matters
If you did everything synchronously, the user waits 2-5 seconds (email + PDF + warehouse). And if any downstream service is down, the order fails even though payment already succeeded. The queue makes each piece independent and retriable.
Concepts Used
| Concept | Lesson | How it's used here |
|---|---|---|
| System design process | 01 | Requirements → Estimation → Design → Deep dive |
| Load balancing | 06 | Distributes requests across Order Service instances |
| Caching | 07 | Product data and prices cached to avoid DB hits |
| Database | 09 | Stores orders, inventory, user data |
| Consistency | 13 | Strong consistency for inventory to prevent overselling |
| Message queue | 15 | Decouples confirmation from email/invoice/warehouse |
| API design | 17 | REST endpoint for placing orders |
| Rate limiting | 19 | Prevents bots from placing fake orders |
| Failure handling | 21 | Circuit breaker on payment, retries on workers |
The Key Insight
Draw the line between "what the user waits for" and "what happens after." Everything before the response must be fast and correct. Everything after can be async, retried, and eventually consistent.