Lab 03: Vector Database Architecture
Overview
Architecture
┌──────────────────────────────────────────────────────────┐
│ Vector Database Architecture │
├──────────────────────────────────────────────────────────┤
│ Embedding Models (text/image/audio) → Dense Vectors │
│ Dimension: 384 (MiniLM), 768 (BERT), 1536 (GPT-4) │
├──────────────────────────────────────────────────────────┤
│ Index Layer: │
│ ├── Flat Index (exact, small datasets <100K) │
│ ├── IVF (inverted file, medium datasets) │
│ └── HNSW (hierarchical, large datasets, best recall) │
├──────────────────────────────────────────────────────────┤
│ Query: embedding → ANN search → Top-K results │
└──────────────────────────────────────────────────────────┘Step 1: Why Vector Databases?
Dimension
Relational DB
Vector DB
Step 2: Similarity Metrics
Metric
Best For
Notes
Step 3: Index Types
Vectors
Requirement
Recommended Index
Step 4: Vector Database Comparison
Feature
pgvector
Pinecone
Weaviate
Chroma
Step 5: Dimensionality Reduction with PCA
Step 6: Approximate Nearest Neighbor Trade-offs
Step 7: Production Vector DB Design
Step 8: Capstone — Build Vector Similarity Engine
Summary
Concept
Key Points
Last updated
