RAG

🎯 Overview

Combine your skills from previous phases to build production-grade RAG systems!

Prerequisites:

✅ Tokenization (Phase 4)
✅ Embeddings (Phase 5)
✅ Neural Networks (Phase 6)
✅ Vector Databases (Phase 7)

Time: 3-4 weeks | 60-80 hours
Outcome: Build AI applications that can query your knowledge base

This is one of the core application-building phases in the repo. It is where embeddings, vector search, prompting, and evaluation start behaving like a real system instead of isolated topics.

📚 What You’ll Learn

Core RAG Concepts

RAG architecture and pipeline
Document processing and chunking strategies
Retrieval methods (dense, sparse, hybrid)
Context management and prompt construction
Re-ranking and result filtering
LLM integration (OpenAI, Anthropic, local models)

Advanced RAG Techniques

🗂️ Module Structure


08-rag/
├── 01_START_HERE.ipynb           # RAG overview and quick demo
├── 02_basic_rag.ipynb             # Simple RAG from scratch
├── 03_document_processing.ipynb   # Chunking strategies
├── 04_langchain_rag.ipynb         # Using LangChain framework
├── 05_llamaindex_rag.ipynb        # Using LlamaIndex framework
├── 06_advanced_retrieval.ipynb    # Hybrid search, re-ranking
├── 07_conversation_rag.ipynb      # Chat with memory
├── 08_evaluation.ipynb            # RAG evaluation metrics
├── 09_hyde_reranking.ipynb        # HyDE-style query expansion plus reranking
├── 10_rag_evaluation_playbook.md  # How to benchmark RAG improvements
├── 11_rag_technique_selection.md  # How to choose the right RAG upgrade
├── 12_advanced_retrieval.ipynb    # Parent-child retrieval, ensemble
├── 13_graphrag_visual_rag.ipynb   # GraphRAG and multimodal RAG
├── 14_corrective_rag.ipynb        # CRAG-style retrieval grading, retry, abstention
├── 15_parent_child_retrieval.ipynb # Structured retrieval with chunk-to-parent expansion
├── 16_raptor_retrieval.ipynb       # RAPTOR-style hierarchical summary-tree retrieval
├── 08_assignment.md                  # Phase assignment
├── 10_challenges.md                  # Hands-on challenges
└── README.md                      # This file

🚀 Quick Start

1. Basic RAG Pipeline


# The fundamental RAG flow:
# 1. Index documents → embeddings → vector DB
# 2. User query → embedding → similarity search
# 3. Retrieved docs + query → LLM → answer
 
from sentence_transformers import SentenceTransformer
from your_vector_db import VectorDB  # Chroma, Qdrant, etc.
from openai import OpenAI
 
# 1. Index your documents
# Use any embedding model - see 05-embeddings/09_embedding_comparison.md for options
# API: Gemini Embedding (cheapest + best), Voyage 3.5, or OpenAI
# Local: Qwen3-Embedding, BGE-M3, or all-MiniLM-L6-v2
model = SentenceTransformer('all-MiniLM-L6-v2')  # local, fast
docs = ["Your documents here..."]
embeddings = model.encode(docs)
db.add(documents=docs, embeddings=embeddings)
 
# 2. Retrieve relevant context
query = "What is RAG?"
query_embedding = model.encode(query)
results = db.search(query_embedding, top_k=3)
 
# 3. Generate answer with LLM (Claude, GPT, Gemini, or local)
context = "\n".join(results)
prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
response = llm.generate(prompt)

📋 Learning Path

Week 1: RAG Fundamentals

Complete 00_START_HERE.ipynb
Build basic RAG in 01_basic_rag.ipynb
Learn chunking strategies in 02_document_processing.ipynb
Project: Simple Q&A on your documents

Week 2: RAG Frameworks

Learn LangChain in 03_langchain_rag.ipynb
Explore LlamaIndex in 04_llamaindex_rag.ipynb
Compare frameworks and choose your favorite
Project: Build a research paper assistant

Week 3: Advanced Techniques

Implement hybrid search in 05_advanced_retrieval.ipynb
Add conversation memory in 06_conversation_rag.ipynb
Learn evaluation in 07_evaluation.ipynb
Project: Code search system for your repos

Week 4: Production Project

Build end-to-end RAG application
Add proper error handling
Implement caching and optimization
Deploy as API (preview of Phase 9)
Capstone: Personal knowledge assistant

Optional Week 5: Modern RAG Deep Dives

Explore HyDE / query rewriting / query decomposition patterns
Work through 08_hyde_reranking.ipynb to compare baseline retrieval vs HyDE + reranking
Compare reranking, contextual compression, and relevant-segment extraction
Work through 11_corrective_rag.ipynb to add retrieval grading, retry logic, and abstention
Work through 12_parent_child_retrieval.ipynb before moving to RAPTOR or GraphRAG
Work through 13_raptor_retrieval.ipynb to compare flat, parent-child, and tree-based retrieval
Study CRAG, Self-RAG, and retrieval-with-feedback loops
Review RAPTOR, GraphRAG, and multimodal RAG architectures
Build a small benchmark to compare at least 3 advanced techniques

🧭 Modern RAG Technique Map

The cloned RAG_Techniques repository is strong because it does not treat RAG as one pattern. It treats RAG as a family of retrieval control strategies. Use this map to understand which techniques matter and when.

Problem	Techniques to Study	Why It Helps	When to Use
Queries are vague or underspecified	Query rewriting, query decomposition, multi-query retrieval, HyDE	Makes retrieval better aligned with user intent	User asks short, ambiguous, or multi-part questions
Chunks lose too much context	Semantic chunking, proposition chunking, contextual headers, window expansion	Preserves meaning while keeping retrieval precise	Long docs, technical manuals, research papers
Retriever finds partly-right docs	Hybrid retrieval, reranking, contextual compression, segment extraction	Improves top-k quality before the LLM sees context	Large corpora, noisy search results, enterprise docs
Questions require structure beyond flat chunks	Parent-child retrieval, hierarchical indices, RAPTOR, GraphRAG	Retrieves summaries, entities, relationships, and larger context blocks	Multi-hop reasoning, long reports, knowledge graphs
System hallucinates or retrieves weak evidence	Reliable RAG, CRAG, Self-RAG, feedback loops	Adds validation and correction before final answer	High-stakes workflows, compliance, research, support
Queries span text, tables, and images	Multimodal RAG, caption-based retrieval, visual RAG, ColPali-style retrieval	Brings non-text content into the retrieval loop	PDFs, dashboards, slide decks, diagrams
Workflow needs tools and planning	Agentic RAG, retrieval orchestration, tool selection	Lets the system choose retrieval tools dynamically	Complex research agents, enterprise copilots

Suggested progression

Learn the baseline pipeline first: chunk, embed, retrieve, answer.
Improve retrieval quality next: hybrid search, reranking, metadata filters.
Improve query understanding after that: rewriting, multi-query, HyDE.
Add reliability controls next: compression, validation, CRAG or Self-RAG.
Only then move into GraphRAG, agentic RAG, and multimodal retrieval.

This ordering matters. Most weak RAG systems fail because teams jump to advanced architecture before fixing chunking, retrieval quality, and evaluation.

Companion guide

Use 11_rag_technique_selection.md if you want a compact decision guide for choosing between HyDE, reranking, compression, RAPTOR, CRAG, Self-RAG, and GraphRAG.

Use 10_rag_evaluation_playbook.md if you want a practical framework for benchmarking retrieval quality, answer quality, latency, and failure behavior.

How To Use This Phase Well

Build one baseline RAG pipeline before exploring advanced variants.
Treat retrieval evaluation as required work, not optional polish.
Compare upgrades against a simple baseline so you can tell whether complexity actually helped.
Use advanced techniques selectively based on failure modes, not because they are popular.

🛠️ Technologies You’ll Use

LLM Frameworks:

LangChain - Most popular, extensive ecosystem
LlamaIndex - Best for document indexing
Haystack - Production-focused

LLM Providers:

OpenAI (GPT-5.4, GPT-4.1, GPT-4.1-mini)
Anthropic (Claude Sonnet 4.6, Haiku 4.5)
Google (Gemini 3.1 Pro, Flash)
Local models (Qwen 3, Llama 4, DeepSeek R1 via Ollama)

Vector Databases:

Use what you learned in Phase 7!
Chroma, Qdrant, Weaviate, Milvus

Embeddings:

OpenAI embeddings (text-embedding-3-small/large)
Sentence Transformers (all-MiniLM-L6-v2, all-mpnet-base-v2)
Cohere embeddings

📊 Key Concepts Explained

1. RAG Pipeline

2. Chunking Strategies

Fixed-size chunks:


chunk_size = 512  # tokens or characters
overlap = 50      # overlap between chunks

Semantic chunks:

Split by paragraphs, sentences
Preserve document structure
Maintain context boundaries

Recursive splitting:

Try different separators (\n\n, \n, ., space)
Preserve hierarchy

3. Retrieval Methods

Dense (Vector Search):

Semantic similarity
Works for paraphrased queries
Requires embeddings

Sparse (Keyword Search):

BM25, TF-IDF
Exact keyword matching
Fast and interpretable

Hybrid:

Combine both approaches
Re-rank with RRF (Reciprocal Rank Fusion)
Best of both worlds

4. What Upgrades a Good RAG System into a Strong One

Query-side upgrades:

Rewrite vague questions into standalone queries
Generate multiple retrieval queries and merge the results
Use HyDE when the question is abstract and semantic similarity is weak

Document-side upgrades:

Use semantic or proposition chunking when fixed windows lose meaning
Add headers, summaries, or parent references to each chunk
Use hierarchical retrieval for long documents and section-level reasoning

Ranking-side upgrades:

Retrieve broad candidate sets first
Re-rank with a cross-encoder or reranker model
Compress context so the generator only sees the best evidence

Control-loop upgrades:

Detect low-confidence retrieval before answering
Retry with transformed queries when the first pass is weak
Add answer verification or evidence grading for high-risk use cases

🎯 Projects

Project 1: Personal Documentation Q&A

Build a chatbot that answers questions about your personal notes, docs, PDFs.

Features:

Upload PDFs, TXTs, Markdown files
Chunk and embed documents
Conversational interface
Source citation

Project 2: Code Search Engine

Semantic search across your GitHub repositories.

Features:

Index code files (Python, JavaScript, etc.)
Search by intent (“how to connect to database?”)
Show relevant code snippets
Explain code functionality

Project 3: Research Assistant

Query academic papers and scientific literature.

Features:

Process research papers (PDFs)
Extract citations and references
Summarize papers
Compare multiple papers

Project 4: Customer Support Bot

RAG-powered FAQ system.

Features:

Index support documentation
Handle common questions
Escalate to human when needed
Track conversation context

Project 5: Advanced Enterprise Search

Build a RAG system that combines multiple retrieval strategies and exposes evidence quality.

Features:

Query rewriting and multi-query retrieval
Hybrid retrieval plus reranking
Metadata-aware filtering
Confidence scoring and answer verification
Failure routing: answer, abstain, or ask follow-up

📈 Evaluation Metrics

Retrieval Quality

Precision@K: Relevant docs in top K results
Recall@K: % of relevant docs retrieved
MRR (Mean Reciprocal Rank): Position of first relevant result
NDCG: Normalized Discounted Cumulative Gain

Generation Quality

Faithfulness: Answer grounded in context
Relevance: Answer addresses the question
Correctness: Factually accurate
Human evaluation: User satisfaction

System Metrics

Latency: Response time
Cost: API costs per query
Cache hit rate: Efficiency

💡 Best Practices

Document Processing

✅ Chunk size: 256-1024 tokens (experiment!)
✅ Overlap: 10-20% of chunk size
✅ Preserve metadata (source, date, author)
✅ Clean text (remove headers, footers)

Retrieval

✅ Retrieve 3-10 documents (balance context vs noise)
✅ Use hybrid search when possible
✅ Re-rank results for better quality
✅ Filter by metadata when relevant

Prompting

✅ Provide clear instructions
✅ Include relevant context only
✅ Ask LLM to cite sources
✅ Handle “I don’t know” cases

Production

✅ Cache embeddings and results
✅ Monitor LLM costs
✅ Implement rate limiting
✅ Add error handling and retries ✅ Benchmark retrieval variants before adding architectural complexity ✅ Track answer faithfulness separately from answer fluency ✅ Keep a failure set of hard questions and regressions ✅ Prefer simpler retrieval improvements before adding agents or graphs

🔗 Resources

Documentation

Papers

Courses

Tools

Ollama - Run local LLMs
Chroma - Vector database
LangSmith - RAG evaluation
Ragas - Evaluate retrieval and answer quality
DeepEval - LLM evaluation for RAG pipelines

✅ Completion Checklist

Before moving to Phase 9 (MLOps), you should be able to:

Explain RAG architecture and benefits
Process and chunk documents effectively
Build basic RAG pipeline from scratch
Use LangChain or LlamaIndex
Implement hybrid search (dense + sparse)
Add conversation memory to chatbots
Evaluate RAG system quality

What Comes Next

Continue to ../09-mlops/README.md if you want deployment, monitoring, and production operations.
Continue to ../11-prompt-engineering/README.md if you want stronger prompt control inside retrieval systems.
Continue to ../15-ai-agents/README.md if you want retrieval as one tool inside larger agent workflows.
Continue to ../16-model-evaluation/README.md if you want stronger measurement beyond ad hoc quality checks.
Explain when to use HyDE, reranking, contextual compression, or GraphRAG
Diagnose retrieval failures and choose the right fix
Deploy a working RAG application
Understand cost/latency tradeoffs
Handle edge cases and errors

🎓 What’s Next?

Phase 9: MLOps & Production →

Deploy RAG as scalable API
Monitor performance and costs
CI/CD for ML systems
Cloud deployment (AWS, Azure, GCP)

Phase 10: Specializations →

Multimodal RAG (images + text)
Agent systems with RAG
Advanced prompt engineering

Ready to build your first RAG system? → Start with 00_START_HERE.ipynb

Questions? → Check the assignment.md, challenges.md, 08_rag_technique_selection.md, and 08_rag_evaluation_playbook.md for practice, technique selection, and benchmarking

🚀 Let’s build intelligent systems that can learn from your data!