05 - Prompt Engineering

What Prompt Engineering Is

Prompt engineering is writing instructions that get reliable, useful output from an LLM. The same question phrased two different ways can produce wildly different results. The model is only as good as what you ask it.

// Vague prompt, vague answer
messages := []Message{
	{Role: "user", Content: "Tell me about errors"},
}
// "Errors are mistakes that occur in programs..."

// Specific prompt, useful answer
messages := []Message{
	{Role: "user", Content: "Explain Go error handling. Cover error interface, wrapping with fmt.Errorf, and errors.Is. Use code examples."},
}
// Detailed, structured response with code

System Prompts

The system message tells the model how to behave. You send it with every request, and the model treats it with more weight than user messages.

messages := []Message{
	{Role: "system", Content: "You are a Go expert. Answer in 1-3 sentences. Include a code example when relevant. Never apologize or hedge."},
	{Role: "user", Content: "What is a defer statement?"},
}

With that system prompt, the model responds concisely:

Output: "defer schedules a function call to run when the surrounding function returns.
It's commonly used for cleanup like closing files or releasing locks.

func readFile(path string) {
    f, _ := os.Open(path)
    defer f.Close()
    // f.Close() runs when readFile returns
}"

Without the system prompt, the model might give a paragraph of background, apologize for any confusion, and hedge with "it depends." The system prompt eliminates that.

Few-Shot Prompting

Instead of describing the output format you want, show the model examples. You do this by adding fake user/assistant message pairs before your real question. The model sees the pattern and follows it.

"Few-shot" means you provide a few examples. "Zero-shot" means no examples, just the instruction. "One-shot" is a single example.

messages := []Message{
	{Role: "system", Content: "Convert natural language to SQL."},
	// Example 1 (user asks, assistant answers)
	{Role: "user", Content: "all users older than 30"},
	{Role: "assistant", Content: "SELECT * FROM users WHERE age > 30;"},
	// Example 2
	{Role: "user", Content: "top 5 products by revenue"},
	{Role: "assistant", Content: "SELECT * FROM products ORDER BY revenue DESC LIMIT 5;"},
	// Real question
	{Role: "user", Content: "users who signed up this month"},
}
// The model follows the pattern: natural language in, SQL out
// "SELECT * FROM users WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE);"

These aren't training examples. The model reads them as part of the conversation and mimics the pattern. Two examples is enough for simple formats. For complex or inconsistent patterns, use 3-5.

Why not just say "return SQL" in the system prompt? Because it's vague. The model doesn't know if you want uppercase keywords, semicolons, or table aliases. Two input/output pairs answer all of that without a word of explanation.

Chain-of-Thought

For complex reasoning, tell the model to think step by step. This dramatically improves accuracy on math, logic, and multi-step problems.

messages := []Message{
	{Role: "system", Content: "Think step by step before giving your final answer."},
	{Role: "user", Content: "A server handles 100 requests/second. Each request uses 2MB of memory and takes 200ms. What's the peak memory usage?"},
}

The model shows its reasoning:

Output:
"Step 1: At 100 req/s with 200ms per request, there are 100 × 0.2 = 20 concurrent requests.
Step 2: Each uses 2MB, so peak memory = 20 × 2MB = 40MB.

Peak memory usage: 40MB."

Without "think step by step," the model often jumps to an answer and gets it wrong. Chain-of-thought forces it to show its work, which improves the reasoning process itself.

Temperature

Temperature controls randomness. The range is 0.0 to 2.0. Low values (0.0-0.3) make output more deterministic and focused. High values (0.7-1.0) make it more creative and varied. Above 1.0 gets increasingly chaotic and is rarely useful.

body := map[string]any{
	"model":    "llama3.2",
	"messages": messages,
	"stream":   false,
	"options": map[string]any{
		"temperature": 0.1, // very focused, nearly deterministic
	},
}
// OpenAI equivalent: "temperature": 0.1

See the difference:

temperature: 0.0
→ "Go is a statically typed, compiled language."
  (same answer every time)

temperature: 1.0
→ "Go is this beautifully pragmatic language..."
  (different each time, more creative)

Use low temperature for factual tasks (code generation, data extraction, classification). Use higher temperature for creative tasks (brainstorming, writing, conversation).

Temperature is an API parameter. You control it when calling the API from code or using ollama run --temperature 0.1. Most chat UIs (ChatGPT, Claude) and CLI agents (Kiro, Cursor) don't expose it. They pick a low temperature internally because their tasks need focused, deterministic output.

Constraining Output

Left to itself, the model rambles. Ask "what's the sentiment?" and you get three paragraphs of analysis. Constraints tell the model exactly what to return and what to skip.

Three types of constraints work well:

Format: Tell the model the exact shape of the response.

messages := []Message{
	{Role: "system", Content: `Classify the sentiment of the user's message.
Respond with exactly one word: positive, negative, or neutral.`},
	{Role: "user", Content: "This product is terrible and I want a refund."},
}
// Output: "negative"
// Without the constraint: "The sentiment appears to be negative because
// the user expresses dissatisfaction with the product and is requesting..."

Exclusions: Tell the model what NOT to do. Models love to hedge, apologize, and add disclaimers. Shut that down.

{Role: "system", Content: `Answer the question directly.
Do not explain your reasoning.
Do not say "it depends."
Do not start with "Great question."`}

Scope: Limit what the model considers.

{Role: "system", Content: `You are a Go code reviewer.
Only consider the standard library.
Do not suggest third-party packages.
Do not rewrite the code, just list issues.`}

The more specific your constraints, the more predictable the output. Vague instructions get vague results.

Prompt Patterns That Work

A few patterns that consistently produce better results.

Role assignment: "You are a senior Go developer reviewing code for production readiness."

Output format: "Respond as a JSON object with keys: summary, issues, score."

Negative constraints: "Do not include disclaimers. Do not say 'it depends.'"

Scope limiting: "Only consider the standard library. Do not suggest third-party packages."

messages := []Message{
	{Role: "system", Content: `You are a code reviewer. For each code snippet:
1. List bugs (if any)
2. List improvements (max 3)
3. Rate quality: good, acceptable, or poor
Respond in plain text, not markdown.`},
	{Role: "user", Content: `func divide(a, b int) int {
    return a / b
}`},
}
// Structured, predictable output every time

Common Mistakes

Too vague. The model matches the effort of your prompt.

❌ "Help me with Go"
→ Generic overview of the Go programming language

✅ "Write a function that reads a CSV file and returns a map[string]int of column sums"
→ Working code you can use

Too much context. Dumping an entire codebase into the prompt wastes tokens and confuses the model. The model tries to consider everything you send. Send only what's relevant to the question.

❌ Pasting 2,000 lines of code + "find the bug"
✅ Pasting the 30-line function that fails + the error message

No examples. Describing the format in words leaves room for interpretation. One example removes all ambiguity. If you want JSON with specific keys, show one complete JSON object.

❌ "Return the data as JSON"
→ Could be any shape, any keys, any nesting

✅ "Return JSON like this: {"name": "Go", "year": 2009}"
→ Exact shape, every time

Ignoring the system prompt. You could put everything in user messages and it would work. But models are fine-tuned to treat the system role as instructions and user as input. The system message carries more weight. It also keeps your app's instructions separate from the human's input, so user messages can't accidentally override your rules.

Key Takeaways

Specific prompts produce better results than vague ones
System prompts carry more weight than user messages. Include them in every request
Few-shot prompting (showing examples) is more reliable than describing what you want
Chain-of-thought ("think step by step") improves reasoning on complex problems
Temperature controls randomness: low for facts, high for creativity
Constrain the output format explicitly: "respond with one word" or "respond as JSON"
Common mistakes: too vague, too much context, no examples, ignoring system prompts