01. The Model Landscape

📋 Jump to Takeaways

The Big Picture

There are six major families of AI models you'll run into. Each comes from a different company, has different strengths, and sits at a different price point. Once you understand the landscape, you stop defaulting to "just use ChatGPT" and start picking the right tool for the job.

The Families

OpenAI (GPT)

The most well-known. GPT-4o is the flagship: fast, good at everything, strong at structured output and code. GPT-4o-mini is the budget option and it's surprisingly capable for simple tasks at a fraction of the cost.

  • GPT-4o: General-purpose workhorse. Good at code, writing, analysis, structured output.
  • GPT-4o-mini: 90% of GPT-4o quality for 95% less cost. Use for simple tasks.
  • o3 / o4-mini: "Thinking" models. They spend extra time reasoning before answering. Best for math, logic, and complex code.

Anthropic (Claude)

Claude is known for following instructions precisely, handling long documents well, and being strong at writing and analysis. Many developers prefer it over GPT for coding tasks.

  • Claude Sonnet: The sweet spot. Fast, capable, good at code and long context.
  • Claude Haiku: Ultra-fast and cheap. Great for classification, extraction, and simple tasks.
  • Claude Opus: Most capable but expensive and slower. For the hardest problems.

Google (Gemini)

Gemini's standout feature is context window size. Gemini 2.5 Pro handles up to 1 million tokens, meaning you can feed it an entire codebase or a 500-page document in one shot. Also strong at multimodal tasks (text + images).

  • Gemini 2.5 Pro: Huge context, strong reasoning, competitive pricing.
  • Gemini 2.5 Flash: Fast and cheap, good for high-volume tasks.

Meta (Llama)

Open-weight models you can run locally or on your own infrastructure. No API costs. Meta releases them for free and the community fine-tunes them for specific tasks.

  • Llama 3.1 (8B, 70B, 405B): The standard open model. The 8B version runs on a laptop.
  • Llama 3.2 (1B, 3B): Tiny models for edge devices and fast local inference.

Alibaba (Qwen)

The best open-weight models for their size right now. Qwen 2.5 competes with models twice its parameter count. Strong at code, math, and multilingual tasks.

  • Qwen 2.5 (7B, 14B, 32B, 72B): Excellent quality relative to size.
  • Qwen Coder: Specialized for code generation and understanding.

DeepSeek

A Chinese lab producing surprisingly strong models at low cost. DeepSeek-R1 is a reasoning model that competes with o3 at a fraction of the price.

  • DeepSeek-V3: General-purpose, very cheap API.
  • DeepSeek-R1: Reasoning model. Shows its thinking process. Strong at math and code.

Cloud vs Local

Models come in two forms:

Cloud APIs: You send a request, pay per token, get a response. No hardware needed. OpenAI, Anthropic, Google, and DeepSeek all offer APIs.

Local models: You download the weights and run inference on your own machine. Free after the initial download. Llama, Qwen, and DeepSeek all have open weights you can run with Ollama.

You'll see these called "open-weight" rather than "open-source." The model files (weights) are free to download and use, but the training code and data usually aren't fully released. It's the model itself that's open, not everything that went into making it.

Cloud API Local (Ollama)
Cost Pay per token Free (you pay electricity)
Speed Fast (dedicated GPUs) Depends on your hardware
Privacy Data leaves your machine Everything stays local
Model size Unlimited Limited by your RAM/VRAM
Setup API key and you're done Download model, run Ollama

Model Sizes

Model size is measured in parameters (billions). More parameters generally means more capable, but also slower and more expensive.

Size Example Runs on Good for
1-3B Llama 3.2 1B Phone, Raspberry Pi Simple classification, extraction
7-8B Qwen 2.5 7B, Llama 3.1 8B Laptop (16GB RAM) General tasks, code, chat
14-32B Qwen 2.5 32B Desktop (32GB+ RAM) Complex reasoning, long context
70B+ Llama 3.1 70B Server with GPU Near-frontier quality
Frontier GPT-4o, Claude Sonnet Cloud only Best quality, highest cost

The Agents Layer

On top of these models sit agents, tools that use models to do work:

  • Kiro: AI coding agent in your terminal
  • Cursor / Windsurf: AI-powered code editors
  • ChatGPT / Claude.ai: Chat interfaces
  • GitHub Copilot: Inline code completion

These agents choose which model to use (or let you choose), add system prompts, manage context, and call tools. Understanding the underlying models helps you use agents better because you know what they can and can't do under the hood.

Key Takeaways

  • Six major families: OpenAI (GPT), Anthropic (Claude), Google (Gemini), Meta (Llama), Alibaba (Qwen), DeepSeek
  • Cloud APIs are easy but cost money. Local models are free but need hardware
  • Model size (parameters) correlates with capability, but bigger isn't always better for your task
  • Open-weight models (Llama, Qwen, DeepSeek) can run locally with Ollama
  • Agents (Kiro, Cursor, ChatGPT) are interfaces built on top of these models
  • The landscape changes fast with new models releasing monthly, but the framework for evaluating them stays the same

📝 Ready to test your knowledge?

Answer the quiz below to mark this lesson complete.

© 2026 ByteLearn.dev. Free courses for developers. · Privacy