01. The Model Landscape
📋 Jump to TakeawaysThe Big Picture
There are six major families of AI models you'll run into. Each comes from a different company, has different strengths, and sits at a different price point. Once you understand the landscape, you stop defaulting to "just use ChatGPT" and start picking the right tool for the job.
The Families
OpenAI (GPT)
The most well-known. GPT-4o is the flagship: fast, good at everything, strong at structured output and code. GPT-4o-mini is the budget option and it's surprisingly capable for simple tasks at a fraction of the cost.
- GPT-4o: General-purpose workhorse. Good at code, writing, analysis, structured output.
- GPT-4o-mini: 90% of GPT-4o quality for 95% less cost. Use for simple tasks.
- o3 / o4-mini: "Thinking" models. They spend extra time reasoning before answering. Best for math, logic, and complex code.
Anthropic (Claude)
Claude is known for following instructions precisely, handling long documents well, and being strong at writing and analysis. Many developers prefer it over GPT for coding tasks.
- Claude Sonnet: The sweet spot. Fast, capable, good at code and long context.
- Claude Haiku: Ultra-fast and cheap. Great for classification, extraction, and simple tasks.
- Claude Opus: Most capable but expensive and slower. For the hardest problems.
Google (Gemini)
Gemini's standout feature is context window size. Gemini 2.5 Pro handles up to 1 million tokens, meaning you can feed it an entire codebase or a 500-page document in one shot. Also strong at multimodal tasks (text + images).
- Gemini 2.5 Pro: Huge context, strong reasoning, competitive pricing.
- Gemini 2.5 Flash: Fast and cheap, good for high-volume tasks.
Meta (Llama)
Open-weight models you can run locally or on your own infrastructure. No API costs. Meta releases them for free and the community fine-tunes them for specific tasks.
- Llama 3.1 (8B, 70B, 405B): The standard open model. The 8B version runs on a laptop.
- Llama 3.2 (1B, 3B): Tiny models for edge devices and fast local inference.
Alibaba (Qwen)
The best open-weight models for their size right now. Qwen 2.5 competes with models twice its parameter count. Strong at code, math, and multilingual tasks.
- Qwen 2.5 (7B, 14B, 32B, 72B): Excellent quality relative to size.
- Qwen Coder: Specialized for code generation and understanding.
DeepSeek
A Chinese lab producing surprisingly strong models at low cost. DeepSeek-R1 is a reasoning model that competes with o3 at a fraction of the price.
- DeepSeek-V3: General-purpose, very cheap API.
- DeepSeek-R1: Reasoning model. Shows its thinking process. Strong at math and code.
Cloud vs Local
Models come in two forms:
Cloud APIs: You send a request, pay per token, get a response. No hardware needed. OpenAI, Anthropic, Google, and DeepSeek all offer APIs.
Local models: You download the weights and run inference on your own machine. Free after the initial download. Llama, Qwen, and DeepSeek all have open weights you can run with Ollama.
You'll see these called "open-weight" rather than "open-source." The model files (weights) are free to download and use, but the training code and data usually aren't fully released. It's the model itself that's open, not everything that went into making it.
| Cloud API | Local (Ollama) | |
|---|---|---|
| Cost | Pay per token | Free (you pay electricity) |
| Speed | Fast (dedicated GPUs) | Depends on your hardware |
| Privacy | Data leaves your machine | Everything stays local |
| Model size | Unlimited | Limited by your RAM/VRAM |
| Setup | API key and you're done | Download model, run Ollama |
Model Sizes
Model size is measured in parameters (billions). More parameters generally means more capable, but also slower and more expensive.
| Size | Example | Runs on | Good for |
|---|---|---|---|
| 1-3B | Llama 3.2 1B | Phone, Raspberry Pi | Simple classification, extraction |
| 7-8B | Qwen 2.5 7B, Llama 3.1 8B | Laptop (16GB RAM) | General tasks, code, chat |
| 14-32B | Qwen 2.5 32B | Desktop (32GB+ RAM) | Complex reasoning, long context |
| 70B+ | Llama 3.1 70B | Server with GPU | Near-frontier quality |
| Frontier | GPT-4o, Claude Sonnet | Cloud only | Best quality, highest cost |
The Agents Layer
On top of these models sit agents, tools that use models to do work:
- Kiro: AI coding agent in your terminal
- Cursor / Windsurf: AI-powered code editors
- ChatGPT / Claude.ai: Chat interfaces
- GitHub Copilot: Inline code completion
These agents choose which model to use (or let you choose), add system prompts, manage context, and call tools. Understanding the underlying models helps you use agents better because you know what they can and can't do under the hood.
Key Takeaways
- Six major families: OpenAI (GPT), Anthropic (Claude), Google (Gemini), Meta (Llama), Alibaba (Qwen), DeepSeek
- Cloud APIs are easy but cost money. Local models are free but need hardware
- Model size (parameters) correlates with capability, but bigger isn't always better for your task
- Open-weight models (Llama, Qwen, DeepSeek) can run locally with Ollama
- Agents (Kiro, Cursor, ChatGPT) are interfaces built on top of these models
- The landscape changes fast with new models releasing monthly, but the framework for evaluating them stays the same