Lab 14: RAG Chatbot

Objective

Build a complete Retrieval-Augmented Generation (RAG) pipeline from scratch — document ingestion, chunking, embedding, vector retrieval, and response generation. Implement the patterns used by LangChain and production RAG systems, without requiring any external API keys.

Time: 55 minutes | Level: Practitioner | Docker Image: zchencow/innozverse-ai:latest


Background

RAG solves the fundamental problem with LLMs: they don't know about your data.

Without RAG:
  User: "What does our security policy say about password rotation?"
  LLM:  "I don't have access to your company's security policy."

With RAG:
  1. RETRIEVE: Search vector DB for relevant policy sections
  2. AUGMENT:  Add retrieved context to the prompt
  3. GENERATE: LLM answers grounded in your actual policy
  Result: "According to Section 4.2 of your security policy, passwords
           must be rotated every 90 days..."

Step 1: Environment Setup

📸 Verified Output:


Step 2: Document Ingestion and Chunking

📸 Verified Output:

💡 Chunk overlap ensures that sentences spanning chunk boundaries are captured by at least one chunk. Without overlap, a question about content at the boundary of two chunks might miss the relevant context.


Step 3: Embedding and Indexing

📸 Verified Output:


Step 4: Retrieval Pipeline

📸 Verified Output:


Step 5: Prompt Engineering for RAG

📸 Verified Output:


Step 6: RAG Evaluation — Faithfulness and Relevance

📸 Verified Output:

💡 100% hit rate means the right document was always in the top-5 results. MRR of 0.94 means on average the right document was ranked 1st or 2nd — users would see it immediately.


Step 7: Hybrid Search — Keyword + Semantic

📸 Verified Output:

💡 Hybrid search combines the precision of keyword matching with the recall of semantic search. Pure semantic found more relevant incident response documents; keyword found the exact "compromised" match. Hybrid gets the best of both.


Step 8: Real-World Capstone — Security Policy Chatbot

📸 Verified Output:

💡 The chatbot correctly answers all 5 questions from policy documents — including the multi-part scanning schedule. No hallucination because responses are grounded in retrieved content.


Summary

RAG pipeline components:

Component
Purpose
Key Design Decision

Chunking

Split docs into searchable pieces

Size + overlap (200 chars, 50 overlap)

Embedding

Represent meaning as vectors

LSA for local; BERT for production

Vector store

Fast similarity search

ChromaDB, Pinecone in production

Retriever

Find relevant chunks

Hybrid search for best recall+precision

Prompt builder

Ground LLM in retrieved context

Always cite sources

Evaluator

Measure retrieval quality

Hit rate, MRR, faithfulness

Key Takeaways:

  • Always evaluate retrieval before evaluating generation

  • Hybrid search outperforms pure keyword or pure semantic alone

  • Chunk overlap prevents boundary-spanning content from being missed

  • Add conversation history for contextual follow-up questions

Further Reading

Last updated