02. How to Choose a Model

📋 Jump to Takeaways

The Four Dimensions

Every model choice comes down to four tradeoffs:

Dimension Question
Quality How good does the output need to be?
Speed How fast do I need the response?
Cost How much am I willing to pay per request?
Privacy Can the data leave my machine?

No model wins on all four. You're always trading one for another.

The Decision Framework

Ask these questions in order:

1. Does the data need to stay local?

If yes, use a local model (Ollama). Skip cloud APIs entirely. This applies to proprietary code, medical records, legal documents, or anything you can't send to a third party.

2. How complex is the task?

Complexity Examples Model tier Options
Simple Classification, extraction, reformatting Cheapest that passes your quality bar GPT-4o-mini, Claude Haiku, Gemini Flash
Medium Summarization, code generation, analysis Mid-tier GPT-4o, Claude Sonnet, Gemini Pro
Hard Complex reasoning, multi-step logic, novel problems Thinking / frontier o3, Claude Opus, DeepSeek-R1

3. What's the volume?

Scenario Priority Guidance
One-off task Quality Use the best model available — cost is irrelevant
1,000 req/day Balance Use the cheapest model that meets your quality bar
100,000 req/day Cost Every cent per request is $1,000/day. Optimize ruthlessly

4. What's the latency requirement?

Scenario Strategy
Interactive (user is waiting) Fast model, streaming enabled
Background (batch processing) Slow model is fine — optimize for cost

Matching Tasks to Models

Task Best choice Why
Classify sentiment (pos/neg/neutral) GPT-4o-mini, Haiku Simple task, cheapest wins
Summarize a meeting transcript Claude Sonnet Good at long text, follows format instructions well
Generate a REST API in Go Claude Sonnet, GPT-4o Both strong at code
Review architecture for flaws o3, DeepSeek-R1 Needs deep reasoning
Extract structured data from emails GPT-4o-mini Structured output is GPT's strength
Analyze a 200-page PDF Gemini Pro 1M token context window fits the whole doc
Chat with users in production GPT-4o-mini, Haiku Fast, cheap, good enough for conversation
Translate to 5 languages Qwen 2.5 Strong multilingual support, runs locally

The "Good Enough" Principle

Don't reach for the best model. Use the cheapest model that produces acceptable output for your specific task.

Example: Classify support tickets into 5 categories

Model Accuracy Cost/ticket Cost at 10K tickets/day
GPT-4o 98% $0.0030 $30/day
GPT-4o-mini 95% $0.0002 $2/day
Haiku 94% $0.0001 $1/day

Haiku wins. The 4% accuracy gap doesn't justify 30× the cost.

For most classification, extraction, and formatting tasks, cheap models are good enough. Save expensive models for tasks where quality actually matters — complex reasoning, creative writing, or nuanced analysis.

Two Phases of Model Selection

Prototyping: Start with the best model to establish what "good" looks like. This gives you a quality ceiling and a reference to compare against. Don't worry about cost yet — you're finding the bar.

Production: Downgrade to the cheapest model that still meets that bar. Every cent matters at scale.

When to Upgrade (or Downgrade)

In production, start cheap and upgrade only when you see actual failures:

  • Output is wrong or incomplete → try a larger model
  • Model doesn't follow your format → try Claude (better at instruction following)
  • Reasoning is shallow or misses edge cases → try a thinking model (o3, R1)
  • Context is too long for the model's window → try Gemini (1M tokens)

If you prototyped with a frontier model, downgrade methodically:

  • Run your test cases against a cheaper model
  • Compare output quality side by side
  • If the cheaper model passes your quality bar → ship it
  • If it doesn't → move one tier up and test again

The Model Tier List (2025)

Tier 1, Frontier (best quality, highest cost):

  • GPT-4o, Claude Sonnet, Gemini 2.5 Pro

Tier 2, Reasoning (slow but deep thinking):

  • o3, o4-mini, DeepSeek-R1

Tier 3, Fast and Cheap (90% quality, 10% cost):

  • GPT-4o-mini, Claude Haiku, Gemini Flash

Tier 4, Local/Open (free, private, runs on your hardware):

  • Llama 3.1 8B, Qwen 2.5 7B/14B, DeepSeek-V3

Most tasks need Tier 3. Some need Tier 1. Few need Tier 2. Use Tier 4 when privacy or eliminating cost entirely matters.

Cloud vs Local Equivalents

Cloud model Local equivalent Notes
GPT-4o-mini, Haiku Llama 3.1 8B, Qwen 2.5 7B Good for simple tasks, fast
GPT-4o, Claude Sonnet Qwen 2.5 14B, Llama 3.1 70B 14B runs on 16GB RAM; 70B needs a server
o3, Claude Opus DeepSeek-R1 14B/70B Thinking model, runs locally but slow
Gemini Flash Llama 3.2 3B Ultra-fast, limited quality

Local models won't match cloud quality at the same parameter count. But they're free, private, and often good enough for simple to medium tasks.

Real Example: Building a Feature

You're building an AI-powered code review feature. Here's how to think through model selection:

Requirements:

  • Reviews pull requests (50–500 lines of code)
  • Finds bugs, suggests improvements
  • Runs on every PR (high volume)
  • Users see results in under 10 seconds

Walking through the framework:

Question Answer
1. Privacy? No — code is already on GitHub. Cloud is fine.
2. Complexity? Medium — needs to understand code, not prove theorems.
3. Volume? High — every PR across the team.
4. Latency? Moderate — under 10 seconds is acceptable.

Conclusion: Start with GPT-4o-mini. It's fast, cheap, and decent at code. If reviews are too shallow, upgrade to Claude Sonnet. Don't start with o3 — it's slow and expensive for this volume.

Key Takeaways

  • Four dimensions: quality, speed, cost, privacy. You always trade between them.
  • Prototyping: start with the best model to find your quality ceiling.
  • Production: downgrade to the cheapest model that still meets that bar.
  • Simple tasks (classification, extraction) → cheap models (mini, Haiku, Flash)
  • Complex reasoning → thinking models (o3, R1)
  • Long documents → Gemini (1M context)
  • Privacy-sensitive data → local models (Ollama)
  • Most tasks need a Tier 3 model. Don't default to the most expensive option.

📝 Ready to test your knowledge?

Answer the quiz below to mark this lesson complete.

© 2026 ByteLearn.dev. Free courses for developers. · Privacy