03 - Talking to LLMs with Go

The Ollama API

Ollama exposes a REST API on localhost:11434. You send a JSON request, you get a JSON response. No SDK needed, just net/http and encoding/json.

# Make sure Ollama is running and you have a model
ollama pull llama3.2
ollama serve  # starts the API server (may already be running)

The endpoint we care about is /api/chat. It takes a model name and an array of messages, just like OpenAI's API.

Your First API Call

A complete Go program that sends a message to Ollama and prints the response.

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
)

type Message struct {
	Role    string `json:"role"`
	Content string `json:"content"`
}

type ChatRequest struct {
	Model    string    `json:"model"`
	Messages []Message `json:"messages"`
	Stream   bool      `json:"stream"`
}

type ChatResponse struct {
	Message Message `json:"message"`
}

func main() {
	req := ChatRequest{
		Model: "llama3.2", // OpenAI: "gpt-4o"
		Messages: []Message{
			{Role: "user", Content: "What is Go good at? One sentence."},
		},
		Stream: false,
	}

	body, _ := json.Marshal(req)
	resp, err := http.Post("http://localhost:11434/api/chat", "application/json", bytes.NewReader(body))
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	defer resp.Body.Close()

	data, _ := io.ReadAll(resp.Body)
	var result ChatResponse
	json.Unmarshal(data, &result)

	fmt.Println(result.Message.Content)
	// "Go excels at building concurrent, high-performance network services and CLI tools."
}

That's it. No SDK, no dependencies. The Ollama API is OpenAI-compatible, so the same message format works with any provider.

The Message Roles

Every message has a role. Three roles cover everything.

messages := []Message{
	{Role: "system", Content: "You are a Go expert. Be concise."},
	{Role: "user", Content: "What are goroutines?"},
}
// system: sets the model's behavior and personality
// user: the human's input
// assistant: the model's previous responses (for multi-turn conversations)

The system message is optional but powerful. It shapes every response the model generates. We'll cover this in depth in the prompt engineering lesson.

Multi-Turn Conversations

LLMs are stateless. They don't remember previous messages. To have a conversation, you send the entire history with every request.

messages := []Message{
	{Role: "system", Content: "You are a helpful Go tutor."},
	{Role: "user", Content: "What is an interface?"},
	{Role: "assistant", Content: "An interface in Go defines a set of method signatures..."},
	{Role: "user", Content: "Give me an example."},
}
// The model sees all four messages and responds to the last one
// with full context of the conversation

Each turn adds tokens. A 20-message conversation means sending all 20 messages every time. Context windows and token budgeting matter because of this.

Wrapping It in a Function

A reusable function that calls Ollama and returns the response text.

func chat(messages []Message) (string, error) {
	req := ChatRequest{
		Model:    "llama3.2",
		Messages: messages,
		Stream:   false,
	}

	body, err := json.Marshal(req)
	if err != nil {
		return "", err
	}

	resp, err := http.Post(
		"http://localhost:11434/api/chat",
		"application/json",
		bytes.NewReader(body),
	)
	if err != nil {
		return "", fmt.Errorf("ollama request failed: %w", err)
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		data, _ := io.ReadAll(resp.Body)
		return "", fmt.Errorf("ollama error %d: %s", resp.StatusCode, data)
	}

	var result ChatResponse
	if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
		return "", fmt.Errorf("failed to parse response: %w", err)
	}

	return result.Message.Content, nil
}

Call it:

func main() {
	reply, err := chat([]Message{
		{Role: "user", Content: "Explain channels in Go. Two sentences."},
	})
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	fmt.Println(reply)
	// "Channels are typed conduits for sending and receiving values between goroutines.
	//  They provide synchronization, ensuring one goroutine waits until another is ready."
}

Switching to OpenAI

The OpenAI API uses the same message format. Change the URL, add an API key header, and adjust the response shape slightly.

// Ollama: no auth needed
url := "http://localhost:11434/api/chat"

// OpenAI: same message format, different endpoint + auth header
url := "https://api.openai.com/v1/chat/completions"

// With Ollama, http.Post is enough.
// With OpenAI, you need to set headers, so use http.NewRequest:
req, _ := http.NewRequest("POST", url, bytes.NewReader(body))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", "Bearer "+os.Getenv("OPENAI_API_KEY"))

resp, err := http.DefaultClient.Do(req)

The messages array is identical for both. The only difference is the URL and the auth header.

Error Handling

LLM APIs fail. The model might not be loaded, Ollama might not be running, or the request might be malformed. Always check errors.

resp, err := http.Post(url, "application/json", bytes.NewReader(body))
if err != nil {
	// Network error: Ollama not running, connection refused
	return "", fmt.Errorf("connection failed: %w", err)
}
defer resp.Body.Close()

if resp.StatusCode != 200 {
	// API error: model not found, invalid request
	errBody, _ := io.ReadAll(resp.Body)
	return "", fmt.Errorf("API error %d: %s", resp.StatusCode, errBody)
}

Common errors:

connection refused: Ollama isn't running. Start it with ollama serve
model not found: You haven't pulled the model. Run ollama pull llama3.2
context length exceeded: Your input is too long for the model's context window

Key Takeaways

Ollama's /api/chat endpoint takes a model name and messages array
No SDK needed. net/http and encoding/json are all you need
Three message roles: system (behavior), user (input), assistant (model's prior responses)
LLMs are stateless. Send the full conversation history with every request
The same message format works for Ollama, OpenAI, and most other providers
Always handle errors: connection failures, API errors, context length limits