07 - Embeddings and Vector Search

What Is an Embedding

You can't do math on words. But you can do math on numbers. An embedding model converts text into a list of numbers, and suddenly "cat" and "kitten" are close together while "cat" and "database" are far apart.

"cat"      → [0.21, -0.45, 0.87, 0.12, ...]
"kitten"   → [0.23, -0.42, 0.85, 0.14, ...]  ← close to "cat"
"database" → [-0.67, 0.31, -0.12, 0.55, ...]  ← far from "cat"

These are simplified. Real embeddings have 768 or 1536 dimensions, and the values come from the model, not from any formula you write. You don't look at individual numbers. You compare entire vectors.

The payoff: "How do I fix a nil pointer?" and "null reference error in Go" have zero words in common, but their embeddings are nearly identical. That's how you search by meaning instead of keywords.

Generating Embeddings with Ollama

The goal for this lesson: take any piece of text, turn it into a vector, and use that vector to find similar text. By the end, you'll have a working semantic search in ~50 lines of Go.

First step: generate embeddings. Ollama has a dedicated model for this called nomic-embed-text. Pull it, then call the /api/embed endpoint.

ollama pull nomic-embed-text

Use model in the calls:

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
)

type EmbedRequest struct {
	Model string `json:"model"`
	Input string `json:"input"`
}

type EmbedResponse struct {
	Embeddings [][]float64 `json:"embeddings"`
}

func embed(text string) ([]float64, error) {
	req := EmbedRequest{Model: "nomic-embed-text", Input: text}
	body, _ := json.Marshal(req)

	resp, err := http.Post("http://localhost:11434/api/embed", "application/json", bytes.NewReader(body))
	if err != nil {
		return nil, err
	}
	defer resp.Body.Close()

	data, _ := io.ReadAll(resp.Body)
	var result EmbedResponse
	json.Unmarshal(data, &result)

	return result.Embeddings[0], nil
}

func main() {
	vec, _ := embed("Go is a compiled language")
	fmt.Printf("Dimensions: %d\n", len(vec))
	fmt.Printf("First 5: %v\n", vec[:5])
	// Dimensions: 768
	// First 5: [0.011954149 -0.029821327 -0.11647157 -0.046213374 0.05328403]
	// Your numbers will differ. The dimensions will be the same.
}

nomic-embed-text produces 768-dimensional vectors. OpenAI's text-embedding-3-small produces 1536 dimensions. More dimensions means more nuance but more storage and computation.

Cosine Similarity

How do you measure if two embeddings are similar? Cosine similarity. It compares the direction of two vectors, ignoring their length. A score of 1.0 means identical meaning, 0.0 means completely unrelated.

You don't need to understand the math deeply. The function takes two vectors and returns a number between -1 and 1. Higher means more similar.

import "math"

func cosineSimilarity(a, b []float64) float64 {
	var dot, normA, normB float64
	for i := range a {
		dot += a[i] * b[i]
		normA += a[i] * a[i]
		normB += b[i] * b[i]
	}
	if normA == 0 || normB == 0 {
		return 0
	}
	return dot / (math.Sqrt(normA) * math.Sqrt(normB))
}

Now embed three sentences and compare them:

func main() {
	v1, _ := embed("How do I handle errors in Go?")
	v2, _ := embed("Go error handling best practices")
	v3, _ := embed("Best pizza restaurants in New York")

	fmt.Printf("v1 vs v2: %.4f\n", cosineSimilarity(v1, v2))
	fmt.Printf("v1 vs v3: %.4f\n", cosineSimilarity(v1, v3))
	// v1 vs v2: 0.8215  ← similar (same topic)
	// v1 vs v3: 0.5523  ← less related
	// Your scores will differ, but v1-v2 should always be higher than v1-v3.
}

Semantic search works on this principle. You don't match keywords. You match meaning.

Building a Simple Vector Search

You can embed text and compare two vectors. Now put it together: embed a collection of documents upfront, then when a user asks a question, embed the question and find the closest documents. This is the core of semantic search, and it's what powers RAG (next lesson).

Store some documents with their embeddings, then find the most similar ones to a query. Embed each document once upfront, then embed the query at search time and compare.

import "sort"

type Document struct {
	Text      string
	Embedding []float64
}

func search(docs []Document, query string, topK int) []Document {
	queryVec, _ := embed(query)

	type scored struct {
		doc   Document
		score float64
	}

	var results []scored
	for _, doc := range docs {
		score := cosineSimilarity(queryVec, doc.Embedding)
		results = append(results, scored{doc, score})
	}

	sort.Slice(results, func(i, j int) bool {
		return results[i].score > results[j].score
	})

	top := make([]Document, 0, topK)
	for i := 0; i < topK && i < len(results); i++ {
		top = append(top, results[i].doc)
	}
	return top
}

Try it with a few Go-related documents:

func main() {
	// Index some documents
	texts := []string{
		"Go uses goroutines for lightweight concurrency",
		"Error handling in Go uses the error interface",
		"Channels allow goroutines to communicate safely",
		"The defer keyword schedules cleanup functions",
		"Slices are dynamically-sized views into arrays",
	}

	var docs []Document
	for _, t := range texts {
		vec, _ := embed(t)
		docs = append(docs, Document{Text: t, Embedding: vec})
	}

	// Search
	results := search(docs, "How do goroutines talk to each other?", 2)
	for _, r := range results {
		fmt.Println("-", r.Text)
	}
	// - Channels allow goroutines to communicate safely
	// - Go uses goroutines for lightweight concurrency
}

The query "How do goroutines talk to each other?" matches "Channels allow goroutines to communicate safely" even though they share almost no words. That's semantic search.

Vector Databases

The in-memory search above works for small datasets. For real applications, you need a vector database that handles storage, indexing, and fast similarity search at scale.

pgvector: PostgreSQL extension. Use your existing database
ChromaDB: Good for prototyping
Pinecone: Managed cloud service
Qdrant: Open source, high performance
Weaviate: Open source, feature-rich

pgvector is the pragmatic choice if you already use PostgreSQL. No new infrastructure, just an extension. You store embeddings alongside your data and query them with SQL.

-- Enable pgvector (run once)
CREATE EXTENSION IF NOT EXISTS vector;

-- Create a table with a vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding vector(768) NOT NULL
);

-- Insert a document (you'd do this from Go, passing the embedding)
INSERT INTO documents (content, embedding)
VALUES ('Go uses goroutines for concurrency', '[0.021, -0.089, ...]');

-- Find the 5 most similar documents to a query embedding
-- In Go, you'd embed the query first, then pass the vector here
SELECT content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;

The <=> operator computes cosine distance. $1 is the query embedding you pass as a parameter from your Go code. Lower distance means higher similarity, so 1 - distance gives you a similarity score.

When to Use Embeddings

Embeddings are the right tool when you need to find semantically similar content. They're the wrong tool when you need exact matches.

Good use cases:

✓ Search ("find docs about error handling")
✓ Recommendations ("similar articles")
✓ Clustering ("group tickets by topic")
✓ RAG retrieval ("find relevant context")

Bad use cases:

✗ Exact lookup ("find user with ID 42")
✗ Keyword filtering ("posts tagged 'golang'")
✗ Structured queries ("orders over $100")

Key Takeaways

Embeddings convert text into numerical vectors that capture meaning
Similar text produces similar vectors, regardless of exact wording
Cosine similarity measures how similar two vectors are (1.0 = identical, 0.0 = unrelated)
Ollama generates embeddings with nomic-embed-text (768 dimensions)
Vector search finds semantically similar documents without keyword matching
For production, use a vector database like pgvector instead of in-memory search
Embeddings power RAG retrieval, semantic search, recommendations, and clustering