Lab 17: Vector Databases & pgvector
Step 1: Vector Databases — The Why
Traditional SQL:
SELECT * FROM products WHERE category = 'shoes'
→ Exact match, deterministic
Vector Search:
SELECT * FROM products
ORDER BY description_embedding <-> query_embedding
LIMIT 10
→ Semantic similarity: finds "sneakers", "footwear", "trainers" too
Use cases:
✓ Semantic document search (find by meaning, not keywords)
✓ Product recommendations ("customers who liked this also liked...")
✓ Image similarity search
✓ RAG (Retrieval-Augmented Generation) for LLM applications
✓ Anomaly detection (distance from cluster center)
✓ Duplicate detection
✓ Facial recognitionStep 2: pgvector — PostgreSQL Extension Setup
Step 3: Similarity Metrics — L2, Cosine, Inner Product
Step 4: IVFFlat Index — Approximate Nearest Neighbor
Dataset Size
lists
probes (for 95% recall)
Step 5: HNSW Index — High Recall Nearest Neighbor
Step 6: Real-World pgvector Patterns
Step 7: Vector Database Comparison
Step 8: Capstone — pgvector Similarity Demo
Summary
Concept
pgvector Syntax
Notes
Last updated
