01. The Model Landscape

The Big Picture

There are six major families of AI models you'll run into. Each comes from a different company, has different strengths, and sits at a different price point. Once you understand the landscape, you stop defaulting to "just use ChatGPT" and start picking the right tool for the job.

The Families

OpenAI (GPT)

The most well-known. GPT-4o is the flagship: fast, good at everything, strong at structured output and code. GPT-4o-mini is the budget option and it's surprisingly capable for simple tasks at a fraction of the cost.

GPT-4o: General-purpose workhorse. Good at code, writing, analysis, structured output.
GPT-4o-mini: 90% of GPT-4o quality for 95% less cost. Use for simple tasks.
o3 / o4-mini: "Thinking" models. They spend extra time reasoning before answering. Best for math, logic, and complex code.

Anthropic (Claude)

Claude is known for following instructions precisely, handling long documents well, and being strong at writing and analysis. Many developers prefer it over GPT for coding tasks.

Claude Sonnet: The sweet spot. Fast, capable, good at code and long context.
Claude Haiku: Ultra-fast and cheap. Great for classification, extraction, and simple tasks.
Claude Opus: Most capable but expensive and slower. For the hardest problems.

Google (Gemini)

Gemini's standout feature is context window size. Gemini 2.5 Pro handles up to 1 million tokens, meaning you can feed it an entire codebase or a 500-page document in one shot. Also strong at multimodal tasks (text + images).

Gemini 2.5 Pro: Huge context, strong reasoning, competitive pricing.
Gemini 2.5 Flash: Fast and cheap, good for high-volume tasks.

Meta (Llama)

Open-weight models you can run locally or on your own infrastructure. No API costs. Meta releases them for free and the community fine-tunes them for specific tasks.

Llama 3.1 (8B, 70B, 405B): The standard open model. The 8B version runs on a laptop.
Llama 3.2 (1B, 3B): Tiny models for edge devices and fast local inference.

Alibaba (Qwen)

The best open-weight models for their size right now. Qwen 2.5 competes with models twice its parameter count. Strong at code, math, and multilingual tasks.

Qwen 2.5 (7B, 14B, 32B, 72B): Excellent quality relative to size.
Qwen Coder: Specialized for code generation and understanding.

DeepSeek

A Chinese lab producing surprisingly strong models at low cost. DeepSeek-R1 is a reasoning model that competes with o3 at a fraction of the price.

DeepSeek-V3: General-purpose, very cheap API.
DeepSeek-R1: Reasoning model. Shows its thinking process. Strong at math and code.

Cloud vs Local

Models come in two forms:

Cloud APIs: You send a request, pay per token, get a response. No hardware needed. OpenAI, Anthropic, Google, and DeepSeek all offer APIs.

Local models: You download the weights and run inference on your own machine. Free after the initial download. Llama, Qwen, and DeepSeek all have open weights you can run with Ollama.

You'll see these called "open-weight" rather than "open-source." The model files (weights) are free to download and use, but the training code and data usually aren't fully released. It's the model itself that's open, not everything that went into making it.

	Cloud API	Local (Ollama)
Cost	Pay per token	Free (you pay electricity)
Speed	Fast (dedicated GPUs)	Depends on your hardware
Privacy	Data leaves your machine	Everything stays local
Model size	Unlimited	Limited by your RAM/VRAM
Setup	API key and you're done	Download model, run Ollama

Model Sizes

Model size is measured in parameters (billions). More parameters generally means more capable, but also slower and more expensive.

Size	Example	Runs on	Good for
1-3B	Llama 3.2 1B	Phone, Raspberry Pi	Simple classification, extraction
7-8B	Qwen 2.5 7B, Llama 3.1 8B	Laptop (16GB RAM)	General tasks, code, chat
14-32B	Qwen 2.5 32B	Desktop (32GB+ RAM)	Complex reasoning, long context
70B+	Llama 3.1 70B	Server with GPU	Near-frontier quality
Frontier	GPT-4o, Claude Sonnet	Cloud only	Best quality, highest cost

The Agents Layer

On top of these models sit agents, tools that use models to do work:

Kiro: AI coding agent in your terminal
Cursor / Windsurf: AI-powered code editors
ChatGPT / Claude.ai: Chat interfaces
GitHub Copilot: Inline code completion

These agents choose which model to use (or let you choose), add system prompts, manage context, and call tools. Understanding the underlying models helps you use agents better because you know what they can and can't do under the hood.

Key Takeaways

Six major families: OpenAI (GPT), Anthropic (Claude), Google (Gemini), Meta (Llama), Alibaba (Qwen), DeepSeek
Cloud APIs are easy but cost money. Local models are free but need hardware
Model size (parameters) correlates with capability, but bigger isn't always better for your task
Open-weight models (Llama, Qwen, DeepSeek) can run locally with Ollama
Agents (Kiro, Cursor, ChatGPT) are interfaces built on top of these models
The landscape changes fast with new models releasing monthly, but the framework for evaluating them stays the same