Lab 17: Building a RAG System in Practice

Objective

Build a complete RAG (Retrieval-Augmented Generation) pipeline. By the end you will be able to:

  • Explain why RAG solves hallucination and knowledge cutoff problems

  • Implement the full RAG pipeline: ingest → embed → retrieve → generate

  • Apply chunking strategies for different document types

  • Evaluate RAG system quality


What is RAG and Why Does It Exist?

LLMs have two fundamental limitations:

  1. Knowledge cutoff — they only know what was in their training data (e.g., before April 2024)

  2. Hallucination — they generate plausible-sounding but potentially false information when uncertain

RAG solves both by retrieving relevant documents at query time and providing them as context:

WITHOUT RAG:
  User: "What were InnoZverse's Q3 2025 results?"
  LLM:  "InnoZverse's Q3 2025 results showed..."  [HALLUCINATION]

WITH RAG:
  User: "What were InnoZverse's Q3 2025 results?"
  Step 1: Search knowledge base for "InnoZverse Q3 2025"
  Step 2: Retrieve: actual_q3_report.pdf, pages 3-7
  Step 3: "Based on the Q3 2025 report: revenue was £4.2M..."  [GROUNDED]

The RAG Architecture


Step 1: Document Ingestion and Chunking

Raw documents must be split into chunks before embedding — you can't embed a 500-page PDF as one vector.

Chunking Strategies by Document Type


Step 2: Embedding

Convert each chunk into a vector representation:


Step 3: Vector Store

Store chunks and their embeddings:


Step 4: Generation — The Full RAG Chain


Advanced RAG Patterns

Hybrid Search: Keyword + Semantic

Self-Query: AI Generates the Filter


Evaluating RAG Quality


Common RAG Failure Modes

Problem
Symptom
Fix

Chunks too large

Retrieved context irrelevant

Reduce chunk_size to 200-400 chars

Chunks too small

Missing context, incomplete answers

Increase chunk_size + overlap

Wrong top-k

Missing relevant docs

Increase k; use MMR for diversity

No overlap

Answers cut off at chunk boundaries

Increase chunk_overlap

Keyword mismatch

Can't find exact terms

Add BM25 hybrid retrieval

Model ignores context

Still hallucinating

Stronger prompt: "ONLY use context"


Further Reading

Last updated