Skip to Content
08 RAG

RAG

🎯 Overview

Combine your skills from previous phases to build production-grade RAG systems!

Prerequisites:

  • ✅ Tokenization (Phase 4)
  • ✅ Embeddings (Phase 5)
  • ✅ Neural Networks (Phase 6)
  • ✅ Vector Databases (Phase 7)

Time: 3-4 weeks | 60-80 hours
Outcome: Build AI applications that can query your knowledge base

This is one of the core application-building phases in the repo. It is where embeddings, vector search, prompting, and evaluation start behaving like a real system instead of isolated topics.


📚 What You’ll Learn

Core RAG Concepts

  • RAG architecture and pipeline
  • Document processing and chunking strategies
  • Retrieval methods (dense, sparse, hybrid)
  • Context management and prompt construction
  • Re-ranking and result filtering
  • LLM integration (OpenAI, Anthropic, local models)

Advanced RAG Techniques

  • Hybrid search (vector + keyword)
  • Query transformation and expansion
  • Multi-query retrieval
  • Parent-document retrieval
  • Self-query and metadata filtering
  • Conversation memory and context
  • HyDE and hypothetical-answer retrieval
  • Contextual compression and segment extraction
  • Cross-encoder reranking
  • Hierarchical retrieval, RAPTOR, and parent-child indexing
  • Corrective RAG (CRAG) and self-reflective retrieval loops
  • GraphRAG, multimodal RAG, and agentic retrieval

🗂️ Module Structure

08-rag/ ├── 01_START_HERE.ipynb # RAG overview and quick demo ├── 02_basic_rag.ipynb # Simple RAG from scratch ├── 03_document_processing.ipynb # Chunking strategies ├── 04_langchain_rag.ipynb # Using LangChain framework ├── 05_llamaindex_rag.ipynb # Using LlamaIndex framework ├── 06_advanced_retrieval.ipynb # Hybrid search, re-ranking ├── 07_conversation_rag.ipynb # Chat with memory ├── 08_evaluation.ipynb # RAG evaluation metrics ├── 09_hyde_reranking.ipynb # HyDE-style query expansion plus reranking ├── 10_rag_evaluation_playbook.md # How to benchmark RAG improvements ├── 11_rag_technique_selection.md # How to choose the right RAG upgrade ├── 12_advanced_retrieval.ipynb # Parent-child retrieval, ensemble ├── 13_graphrag_visual_rag.ipynb # GraphRAG and multimodal RAG ├── 14_corrective_rag.ipynb # CRAG-style retrieval grading, retry, abstention ├── 15_parent_child_retrieval.ipynb # Structured retrieval with chunk-to-parent expansion ├── 16_raptor_retrieval.ipynb # RAPTOR-style hierarchical summary-tree retrieval ├── 08_assignment.md # Phase assignment ├── 10_challenges.md # Hands-on challenges └── README.md # This file

🚀 Quick Start

1. Basic RAG Pipeline

# The fundamental RAG flow: # 1. Index documents → embeddings → vector DB # 2. User query → embedding → similarity search # 3. Retrieved docs + query → LLM → answer from sentence_transformers import SentenceTransformer from your_vector_db import VectorDB # Chroma, Qdrant, etc. from openai import OpenAI # 1. Index your documents # Use any embedding model - see 05-embeddings/09_embedding_comparison.md for options # API: Gemini Embedding (cheapest + best), Voyage 3.5, or OpenAI # Local: Qwen3-Embedding, BGE-M3, or all-MiniLM-L6-v2 model = SentenceTransformer('all-MiniLM-L6-v2') # local, fast docs = ["Your documents here..."] embeddings = model.encode(docs) db.add(documents=docs, embeddings=embeddings) # 2. Retrieve relevant context query = "What is RAG?" query_embedding = model.encode(query) results = db.search(query_embedding, top_k=3) # 3. Generate answer with LLM (Claude, GPT, Gemini, or local) context = "\n".join(results) prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:" response = llm.generate(prompt)

📋 Learning Path

Week 1: RAG Fundamentals

  • Complete 00_START_HERE.ipynb
  • Build basic RAG in 01_basic_rag.ipynb
  • Learn chunking strategies in 02_document_processing.ipynb
  • Project: Simple Q&A on your documents

Week 2: RAG Frameworks

  • Learn LangChain in 03_langchain_rag.ipynb
  • Explore LlamaIndex in 04_llamaindex_rag.ipynb
  • Compare frameworks and choose your favorite
  • Project: Build a research paper assistant

Week 3: Advanced Techniques

  • Implement hybrid search in 05_advanced_retrieval.ipynb
  • Add conversation memory in 06_conversation_rag.ipynb
  • Learn evaluation in 07_evaluation.ipynb
  • Project: Code search system for your repos

Week 4: Production Project

  • Build end-to-end RAG application
  • Add proper error handling
  • Implement caching and optimization
  • Deploy as API (preview of Phase 9)
  • Capstone: Personal knowledge assistant

Optional Week 5: Modern RAG Deep Dives

  • Explore HyDE / query rewriting / query decomposition patterns
  • Work through 08_hyde_reranking.ipynb to compare baseline retrieval vs HyDE + reranking
  • Compare reranking, contextual compression, and relevant-segment extraction
  • Work through 11_corrective_rag.ipynb to add retrieval grading, retry logic, and abstention
  • Work through 12_parent_child_retrieval.ipynb before moving to RAPTOR or GraphRAG
  • Work through 13_raptor_retrieval.ipynb to compare flat, parent-child, and tree-based retrieval
  • Study CRAG, Self-RAG, and retrieval-with-feedback loops
  • Review RAPTOR, GraphRAG, and multimodal RAG architectures
  • Build a small benchmark to compare at least 3 advanced techniques

🧭 Modern RAG Technique Map

The cloned RAG_Techniques repository is strong because it does not treat RAG as one pattern. It treats RAG as a family of retrieval control strategies. Use this map to understand which techniques matter and when.

ProblemTechniques to StudyWhy It HelpsWhen to Use
Queries are vague or underspecifiedQuery rewriting, query decomposition, multi-query retrieval, HyDEMakes retrieval better aligned with user intentUser asks short, ambiguous, or multi-part questions
Chunks lose too much contextSemantic chunking, proposition chunking, contextual headers, window expansionPreserves meaning while keeping retrieval preciseLong docs, technical manuals, research papers
Retriever finds partly-right docsHybrid retrieval, reranking, contextual compression, segment extractionImproves top-k quality before the LLM sees contextLarge corpora, noisy search results, enterprise docs
Questions require structure beyond flat chunksParent-child retrieval, hierarchical indices, RAPTOR, GraphRAGRetrieves summaries, entities, relationships, and larger context blocksMulti-hop reasoning, long reports, knowledge graphs
System hallucinates or retrieves weak evidenceReliable RAG, CRAG, Self-RAG, feedback loopsAdds validation and correction before final answerHigh-stakes workflows, compliance, research, support
Queries span text, tables, and imagesMultimodal RAG, caption-based retrieval, visual RAG, ColPali-style retrievalBrings non-text content into the retrieval loopPDFs, dashboards, slide decks, diagrams
Workflow needs tools and planningAgentic RAG, retrieval orchestration, tool selectionLets the system choose retrieval tools dynamicallyComplex research agents, enterprise copilots

Suggested progression

  1. Learn the baseline pipeline first: chunk, embed, retrieve, answer.
  2. Improve retrieval quality next: hybrid search, reranking, metadata filters.
  3. Improve query understanding after that: rewriting, multi-query, HyDE.
  4. Add reliability controls next: compression, validation, CRAG or Self-RAG.
  5. Only then move into GraphRAG, agentic RAG, and multimodal retrieval.

This ordering matters. Most weak RAG systems fail because teams jump to advanced architecture before fixing chunking, retrieval quality, and evaluation.

Companion guide

Use 11_rag_technique_selection.md if you want a compact decision guide for choosing between HyDE, reranking, compression, RAPTOR, CRAG, Self-RAG, and GraphRAG.

Use 10_rag_evaluation_playbook.md if you want a practical framework for benchmarking retrieval quality, answer quality, latency, and failure behavior.

How To Use This Phase Well

  • Build one baseline RAG pipeline before exploring advanced variants.
  • Treat retrieval evaluation as required work, not optional polish.
  • Compare upgrades against a simple baseline so you can tell whether complexity actually helped.
  • Use advanced techniques selectively based on failure modes, not because they are popular.

🛠️ Technologies You’ll Use

LLM Frameworks:

  • LangChain - Most popular, extensive ecosystem
  • LlamaIndex - Best for document indexing
  • Haystack - Production-focused

LLM Providers:

  • OpenAI (GPT-5.4, GPT-4.1, GPT-4.1-mini)
  • Anthropic (Claude Sonnet 4.6, Haiku 4.5)
  • Google (Gemini 3.1 Pro, Flash)
  • Local models (Qwen 3, Llama 4, DeepSeek R1 via Ollama)

Vector Databases:

  • Use what you learned in Phase 7!
  • Chroma, Qdrant, Weaviate, Milvus

Embeddings:

  • OpenAI embeddings (text-embedding-3-small/large)
  • Sentence Transformers (all-MiniLM-L6-v2, all-mpnet-base-v2)
  • Cohere embeddings

📊 Key Concepts Explained

1. RAG Pipeline

2. Chunking Strategies

Fixed-size chunks:

chunk_size = 512 # tokens or characters overlap = 50 # overlap between chunks

Semantic chunks:

  • Split by paragraphs, sentences
  • Preserve document structure
  • Maintain context boundaries

Recursive splitting:

  • Try different separators (\n\n, \n, ., space)
  • Preserve hierarchy

3. Retrieval Methods

Dense (Vector Search):

  • Semantic similarity
  • Works for paraphrased queries
  • Requires embeddings

Sparse (Keyword Search):

  • BM25, TF-IDF
  • Exact keyword matching
  • Fast and interpretable

Hybrid:

  • Combine both approaches
  • Re-rank with RRF (Reciprocal Rank Fusion)
  • Best of both worlds

4. What Upgrades a Good RAG System into a Strong One

Query-side upgrades:

  • Rewrite vague questions into standalone queries
  • Generate multiple retrieval queries and merge the results
  • Use HyDE when the question is abstract and semantic similarity is weak

Document-side upgrades:

  • Use semantic or proposition chunking when fixed windows lose meaning
  • Add headers, summaries, or parent references to each chunk
  • Use hierarchical retrieval for long documents and section-level reasoning

Ranking-side upgrades:

  • Retrieve broad candidate sets first
  • Re-rank with a cross-encoder or reranker model
  • Compress context so the generator only sees the best evidence

Control-loop upgrades:

  • Detect low-confidence retrieval before answering
  • Retry with transformed queries when the first pass is weak
  • Add answer verification or evidence grading for high-risk use cases

🎯 Projects

Project 1: Personal Documentation Q&A

Build a chatbot that answers questions about your personal notes, docs, PDFs.

Features:

  • Upload PDFs, TXTs, Markdown files
  • Chunk and embed documents
  • Conversational interface
  • Source citation

Project 2: Code Search Engine

Semantic search across your GitHub repositories.

Features:

  • Index code files (Python, JavaScript, etc.)
  • Search by intent (“how to connect to database?”)
  • Show relevant code snippets
  • Explain code functionality

Project 3: Research Assistant

Query academic papers and scientific literature.

Features:

  • Process research papers (PDFs)
  • Extract citations and references
  • Summarize papers
  • Compare multiple papers

Project 4: Customer Support Bot

RAG-powered FAQ system.

Features:

  • Index support documentation
  • Handle common questions
  • Escalate to human when needed
  • Track conversation context

Build a RAG system that combines multiple retrieval strategies and exposes evidence quality.

Features:

  • Query rewriting and multi-query retrieval
  • Hybrid retrieval plus reranking
  • Metadata-aware filtering
  • Confidence scoring and answer verification
  • Failure routing: answer, abstain, or ask follow-up

📈 Evaluation Metrics

Retrieval Quality

  • Precision@K: Relevant docs in top K results
  • Recall@K: % of relevant docs retrieved
  • MRR (Mean Reciprocal Rank): Position of first relevant result
  • NDCG: Normalized Discounted Cumulative Gain

Generation Quality

  • Faithfulness: Answer grounded in context
  • Relevance: Answer addresses the question
  • Correctness: Factually accurate
  • Human evaluation: User satisfaction

System Metrics

  • Latency: Response time
  • Cost: API costs per query
  • Cache hit rate: Efficiency

💡 Best Practices

Document Processing

✅ Chunk size: 256-1024 tokens (experiment!)
✅ Overlap: 10-20% of chunk size
✅ Preserve metadata (source, date, author)
✅ Clean text (remove headers, footers)

Retrieval

✅ Retrieve 3-10 documents (balance context vs noise)
✅ Use hybrid search when possible
✅ Re-rank results for better quality
✅ Filter by metadata when relevant

Prompting

✅ Provide clear instructions
✅ Include relevant context only
✅ Ask LLM to cite sources
✅ Handle “I don’t know” cases

Production

✅ Cache embeddings and results
✅ Monitor LLM costs
✅ Implement rate limiting
✅ Add error handling and retries ✅ Benchmark retrieval variants before adding architectural complexity ✅ Track answer faithfulness separately from answer fluency ✅ Keep a failure set of hard questions and regressions ✅ Prefer simpler retrieval improvements before adding agents or graphs


🔗 Resources

Documentation

Papers

Courses

Tools


✅ Completion Checklist

Before moving to Phase 9 (MLOps), you should be able to:

  • Explain RAG architecture and benefits
  • Process and chunk documents effectively
  • Build basic RAG pipeline from scratch
  • Use LangChain or LlamaIndex
  • Implement hybrid search (dense + sparse)
  • Add conversation memory to chatbots
  • Evaluate RAG system quality

What Comes Next

  • Continue to ../09-mlops/README.md if you want deployment, monitoring, and production operations.
  • Continue to ../11-prompt-engineering/README.md if you want stronger prompt control inside retrieval systems.
  • Continue to ../15-ai-agents/README.md if you want retrieval as one tool inside larger agent workflows.
  • Continue to ../16-model-evaluation/README.md if you want stronger measurement beyond ad hoc quality checks.
  • Explain when to use HyDE, reranking, contextual compression, or GraphRAG
  • Diagnose retrieval failures and choose the right fix
  • Deploy a working RAG application
  • Understand cost/latency tradeoffs
  • Handle edge cases and errors

🎓 What’s Next?

Phase 9: MLOps & Production

  • Deploy RAG as scalable API
  • Monitor performance and costs
  • CI/CD for ML systems
  • Cloud deployment (AWS, Azure, GCP)

Phase 10: Specializations

  • Multimodal RAG (images + text)
  • Agent systems with RAG
  • Advanced prompt engineering

Ready to build your first RAG system? → Start with 00_START_HERE.ipynb

Questions? → Check the assignment.md, challenges.md, 08_rag_technique_selection.md, and 08_rag_evaluation_playbook.md for practice, technique selection, and benchmarking

🚀 Let’s build intelligent systems that can learn from your data!

Last updated on