Lab 05: RAG at Scale
Overview
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ RAG at Scale Architecture │
├──────────────────────────────┬──────────────────────────────────┤
│ INGESTION PIPELINE │ QUERY PIPELINE │
│ Documents → Parse │ Query → Embedding │
│ → Chunk (strategy) │ → BM25 sparse search │
│ → Embed (model) │ → Vector dense search │
│ → Index (vector DB) │ → RRF fusion │
│ → BM25 index │ → Re-ranker (cross-encoder) │
│ │ → Context compression │
│ │ → LLM → Answer │
└──────────────────────────────┴──────────────────────────────────┘Step 1: Document Ingestion Pipeline
Step 2: Chunking Strategies
Strategy
Chunk Size
Overlap
Best For
Step 3: Embedding Models
Model
Dimension
Context
MTEB Score
Cost
Best For
Step 4: Hybrid Search (BM25 + Vector)
Step 5: Re-ranking
Step 6: Context Compression
Technique
Method
Compression Ratio
Quality Impact
Step 7: RAG Evaluation Metrics
Metric
Measures
Formula
Step 8: Capstone — Hybrid RAG Search System
Summary
Concept
Key Points
Last updated
