RAG
🎯 Overview
Combine your skills from previous phases to build production-grade RAG systems!
Prerequisites:
- ✅ Tokenization (Phase 4)
- ✅ Embeddings (Phase 5)
- ✅ Neural Networks (Phase 6)
- ✅ Vector Databases (Phase 7)
Time: 3-4 weeks | 60-80 hours
Outcome: Build AI applications that can query your knowledge base
This is one of the core application-building phases in the repo. It is where embeddings, vector search, prompting, and evaluation start behaving like a real system instead of isolated topics.
📚 What You’ll Learn
Core RAG Concepts
- RAG architecture and pipeline
- Document processing and chunking strategies
- Retrieval methods (dense, sparse, hybrid)
- Context management and prompt construction
- Re-ranking and result filtering
- LLM integration (OpenAI, Anthropic, local models)
Advanced RAG Techniques
- Hybrid search (vector + keyword)
- Query transformation and expansion
- Multi-query retrieval
- Parent-document retrieval
- Self-query and metadata filtering
- Conversation memory and context
- HyDE and hypothetical-answer retrieval
- Contextual compression and segment extraction
- Cross-encoder reranking
- Hierarchical retrieval, RAPTOR, and parent-child indexing
- Corrective RAG (CRAG) and self-reflective retrieval loops
- GraphRAG, multimodal RAG, and agentic retrieval
🗂️ Module Structure
08-rag/
├── 01_START_HERE.ipynb # RAG overview and quick demo
├── 02_basic_rag.ipynb # Simple RAG from scratch
├── 03_document_processing.ipynb # Chunking strategies
├── 04_langchain_rag.ipynb # Using LangChain framework
├── 05_llamaindex_rag.ipynb # Using LlamaIndex framework
├── 06_advanced_retrieval.ipynb # Hybrid search, re-ranking
├── 07_conversation_rag.ipynb # Chat with memory
├── 08_evaluation.ipynb # RAG evaluation metrics
├── 09_hyde_reranking.ipynb # HyDE-style query expansion plus reranking
├── 10_rag_evaluation_playbook.md # How to benchmark RAG improvements
├── 11_rag_technique_selection.md # How to choose the right RAG upgrade
├── 12_advanced_retrieval.ipynb # Parent-child retrieval, ensemble
├── 13_graphrag_visual_rag.ipynb # GraphRAG and multimodal RAG
├── 14_corrective_rag.ipynb # CRAG-style retrieval grading, retry, abstention
├── 15_parent_child_retrieval.ipynb # Structured retrieval with chunk-to-parent expansion
├── 16_raptor_retrieval.ipynb # RAPTOR-style hierarchical summary-tree retrieval
├── 08_assignment.md # Phase assignment
├── 10_challenges.md # Hands-on challenges
└── README.md # This file🚀 Quick Start
1. Basic RAG Pipeline
# The fundamental RAG flow:
# 1. Index documents → embeddings → vector DB
# 2. User query → embedding → similarity search
# 3. Retrieved docs + query → LLM → answer
from sentence_transformers import SentenceTransformer
from your_vector_db import VectorDB # Chroma, Qdrant, etc.
from openai import OpenAI
# 1. Index your documents
# Use any embedding model - see 05-embeddings/09_embedding_comparison.md for options
# API: Gemini Embedding (cheapest + best), Voyage 3.5, or OpenAI
# Local: Qwen3-Embedding, BGE-M3, or all-MiniLM-L6-v2
model = SentenceTransformer('all-MiniLM-L6-v2') # local, fast
docs = ["Your documents here..."]
embeddings = model.encode(docs)
db.add(documents=docs, embeddings=embeddings)
# 2. Retrieve relevant context
query = "What is RAG?"
query_embedding = model.encode(query)
results = db.search(query_embedding, top_k=3)
# 3. Generate answer with LLM (Claude, GPT, Gemini, or local)
context = "\n".join(results)
prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
response = llm.generate(prompt)📋 Learning Path
Week 1: RAG Fundamentals
- Complete
00_START_HERE.ipynb - Build basic RAG in
01_basic_rag.ipynb - Learn chunking strategies in
02_document_processing.ipynb - Project: Simple Q&A on your documents
Week 2: RAG Frameworks
- Learn LangChain in
03_langchain_rag.ipynb - Explore LlamaIndex in
04_llamaindex_rag.ipynb - Compare frameworks and choose your favorite
- Project: Build a research paper assistant
Week 3: Advanced Techniques
- Implement hybrid search in
05_advanced_retrieval.ipynb - Add conversation memory in
06_conversation_rag.ipynb - Learn evaluation in
07_evaluation.ipynb - Project: Code search system for your repos
Week 4: Production Project
- Build end-to-end RAG application
- Add proper error handling
- Implement caching and optimization
- Deploy as API (preview of Phase 9)
- Capstone: Personal knowledge assistant
Optional Week 5: Modern RAG Deep Dives
- Explore HyDE / query rewriting / query decomposition patterns
- Work through
08_hyde_reranking.ipynbto compare baseline retrieval vs HyDE + reranking - Compare reranking, contextual compression, and relevant-segment extraction
- Work through
11_corrective_rag.ipynbto add retrieval grading, retry logic, and abstention - Work through
12_parent_child_retrieval.ipynbbefore moving to RAPTOR or GraphRAG - Work through
13_raptor_retrieval.ipynbto compare flat, parent-child, and tree-based retrieval - Study CRAG, Self-RAG, and retrieval-with-feedback loops
- Review RAPTOR, GraphRAG, and multimodal RAG architectures
- Build a small benchmark to compare at least 3 advanced techniques
🧭 Modern RAG Technique Map
The cloned RAG_Techniques repository is strong because it does not treat RAG as one pattern. It treats RAG as a family of retrieval control strategies. Use this map to understand which techniques matter and when.
| Problem | Techniques to Study | Why It Helps | When to Use |
|---|---|---|---|
| Queries are vague or underspecified | Query rewriting, query decomposition, multi-query retrieval, HyDE | Makes retrieval better aligned with user intent | User asks short, ambiguous, or multi-part questions |
| Chunks lose too much context | Semantic chunking, proposition chunking, contextual headers, window expansion | Preserves meaning while keeping retrieval precise | Long docs, technical manuals, research papers |
| Retriever finds partly-right docs | Hybrid retrieval, reranking, contextual compression, segment extraction | Improves top-k quality before the LLM sees context | Large corpora, noisy search results, enterprise docs |
| Questions require structure beyond flat chunks | Parent-child retrieval, hierarchical indices, RAPTOR, GraphRAG | Retrieves summaries, entities, relationships, and larger context blocks | Multi-hop reasoning, long reports, knowledge graphs |
| System hallucinates or retrieves weak evidence | Reliable RAG, CRAG, Self-RAG, feedback loops | Adds validation and correction before final answer | High-stakes workflows, compliance, research, support |
| Queries span text, tables, and images | Multimodal RAG, caption-based retrieval, visual RAG, ColPali-style retrieval | Brings non-text content into the retrieval loop | PDFs, dashboards, slide decks, diagrams |
| Workflow needs tools and planning | Agentic RAG, retrieval orchestration, tool selection | Lets the system choose retrieval tools dynamically | Complex research agents, enterprise copilots |
Suggested progression
- Learn the baseline pipeline first: chunk, embed, retrieve, answer.
- Improve retrieval quality next: hybrid search, reranking, metadata filters.
- Improve query understanding after that: rewriting, multi-query, HyDE.
- Add reliability controls next: compression, validation, CRAG or Self-RAG.
- Only then move into GraphRAG, agentic RAG, and multimodal retrieval.
This ordering matters. Most weak RAG systems fail because teams jump to advanced architecture before fixing chunking, retrieval quality, and evaluation.
Companion guide
Use 11_rag_technique_selection.md if you want a compact decision guide for choosing between HyDE, reranking, compression, RAPTOR, CRAG, Self-RAG, and GraphRAG.
Use 10_rag_evaluation_playbook.md if you want a practical framework for benchmarking retrieval quality, answer quality, latency, and failure behavior.
How To Use This Phase Well
- Build one baseline RAG pipeline before exploring advanced variants.
- Treat retrieval evaluation as required work, not optional polish.
- Compare upgrades against a simple baseline so you can tell whether complexity actually helped.
- Use advanced techniques selectively based on failure modes, not because they are popular.
🛠️ Technologies You’ll Use
LLM Frameworks:
- LangChain - Most popular, extensive ecosystem
- LlamaIndex - Best for document indexing
- Haystack - Production-focused
LLM Providers:
- OpenAI (GPT-5.4, GPT-4.1, GPT-4.1-mini)
- Anthropic (Claude Sonnet 4.6, Haiku 4.5)
- Google (Gemini 3.1 Pro, Flash)
- Local models (Qwen 3, Llama 4, DeepSeek R1 via Ollama)
Vector Databases:
- Use what you learned in Phase 7!
- Chroma, Qdrant, Weaviate, Milvus
Embeddings:
- OpenAI embeddings (text-embedding-3-small/large)
- Sentence Transformers (all-MiniLM-L6-v2, all-mpnet-base-v2)
- Cohere embeddings
📊 Key Concepts Explained
1. RAG Pipeline
2. Chunking Strategies
Fixed-size chunks:
chunk_size = 512 # tokens or characters
overlap = 50 # overlap between chunksSemantic chunks:
- Split by paragraphs, sentences
- Preserve document structure
- Maintain context boundaries
Recursive splitting:
- Try different separators (\n\n, \n, ., space)
- Preserve hierarchy
3. Retrieval Methods
Dense (Vector Search):
- Semantic similarity
- Works for paraphrased queries
- Requires embeddings
Sparse (Keyword Search):
- BM25, TF-IDF
- Exact keyword matching
- Fast and interpretable
Hybrid:
- Combine both approaches
- Re-rank with RRF (Reciprocal Rank Fusion)
- Best of both worlds
4. What Upgrades a Good RAG System into a Strong One
Query-side upgrades:
- Rewrite vague questions into standalone queries
- Generate multiple retrieval queries and merge the results
- Use HyDE when the question is abstract and semantic similarity is weak
Document-side upgrades:
- Use semantic or proposition chunking when fixed windows lose meaning
- Add headers, summaries, or parent references to each chunk
- Use hierarchical retrieval for long documents and section-level reasoning
Ranking-side upgrades:
- Retrieve broad candidate sets first
- Re-rank with a cross-encoder or reranker model
- Compress context so the generator only sees the best evidence
Control-loop upgrades:
- Detect low-confidence retrieval before answering
- Retry with transformed queries when the first pass is weak
- Add answer verification or evidence grading for high-risk use cases
🎯 Projects
Project 1: Personal Documentation Q&A
Build a chatbot that answers questions about your personal notes, docs, PDFs.
Features:
- Upload PDFs, TXTs, Markdown files
- Chunk and embed documents
- Conversational interface
- Source citation
Project 2: Code Search Engine
Semantic search across your GitHub repositories.
Features:
- Index code files (Python, JavaScript, etc.)
- Search by intent (“how to connect to database?”)
- Show relevant code snippets
- Explain code functionality
Project 3: Research Assistant
Query academic papers and scientific literature.
Features:
- Process research papers (PDFs)
- Extract citations and references
- Summarize papers
- Compare multiple papers
Project 4: Customer Support Bot
RAG-powered FAQ system.
Features:
- Index support documentation
- Handle common questions
- Escalate to human when needed
- Track conversation context
Project 5: Advanced Enterprise Search
Build a RAG system that combines multiple retrieval strategies and exposes evidence quality.
Features:
- Query rewriting and multi-query retrieval
- Hybrid retrieval plus reranking
- Metadata-aware filtering
- Confidence scoring and answer verification
- Failure routing: answer, abstain, or ask follow-up
📈 Evaluation Metrics
Retrieval Quality
- Precision@K: Relevant docs in top K results
- Recall@K: % of relevant docs retrieved
- MRR (Mean Reciprocal Rank): Position of first relevant result
- NDCG: Normalized Discounted Cumulative Gain
Generation Quality
- Faithfulness: Answer grounded in context
- Relevance: Answer addresses the question
- Correctness: Factually accurate
- Human evaluation: User satisfaction
System Metrics
- Latency: Response time
- Cost: API costs per query
- Cache hit rate: Efficiency
💡 Best Practices
Document Processing
✅ Chunk size: 256-1024 tokens (experiment!)
✅ Overlap: 10-20% of chunk size
✅ Preserve metadata (source, date, author)
✅ Clean text (remove headers, footers)
Retrieval
✅ Retrieve 3-10 documents (balance context vs noise)
✅ Use hybrid search when possible
✅ Re-rank results for better quality
✅ Filter by metadata when relevant
Prompting
✅ Provide clear instructions
✅ Include relevant context only
✅ Ask LLM to cite sources
✅ Handle “I don’t know” cases
Production
✅ Cache embeddings and results
✅ Monitor LLM costs
✅ Implement rate limiting
✅ Add error handling and retries
✅ Benchmark retrieval variants before adding architectural complexity
✅ Track answer faithfulness separately from answer fluency
✅ Keep a failure set of hard questions and regressions
✅ Prefer simpler retrieval improvements before adding agents or graphs
🔗 Resources
Documentation
Papers
- RAG: Retrieval-Augmented Generation
- Improving RAG with Hybrid Search
- RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
- Corrective Retrieval Augmented Generation
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Courses
Tools
- Ollama - Run local LLMs
- Chroma - Vector database
- LangSmith - RAG evaluation
- Ragas - Evaluate retrieval and answer quality
- DeepEval - LLM evaluation for RAG pipelines
✅ Completion Checklist
Before moving to Phase 9 (MLOps), you should be able to:
- Explain RAG architecture and benefits
- Process and chunk documents effectively
- Build basic RAG pipeline from scratch
- Use LangChain or LlamaIndex
- Implement hybrid search (dense + sparse)
- Add conversation memory to chatbots
- Evaluate RAG system quality
What Comes Next
- Continue to ../09-mlops/README.md if you want deployment, monitoring, and production operations.
- Continue to ../11-prompt-engineering/README.md if you want stronger prompt control inside retrieval systems.
- Continue to ../15-ai-agents/README.md if you want retrieval as one tool inside larger agent workflows.
- Continue to ../16-model-evaluation/README.md if you want stronger measurement beyond ad hoc quality checks.
- Explain when to use HyDE, reranking, contextual compression, or GraphRAG
- Diagnose retrieval failures and choose the right fix
- Deploy a working RAG application
- Understand cost/latency tradeoffs
- Handle edge cases and errors
🎓 What’s Next?
Phase 9: MLOps & Production →
- Deploy RAG as scalable API
- Monitor performance and costs
- CI/CD for ML systems
- Cloud deployment (AWS, Azure, GCP)
Phase 10: Specializations →
- Multimodal RAG (images + text)
- Agent systems with RAG
- Advanced prompt engineering
Ready to build your first RAG system? → Start with 00_START_HERE.ipynb
Questions? → Check the assignment.md, challenges.md, 08_rag_technique_selection.md, and 08_rag_evaluation_playbook.md for practice, technique selection, and benchmarking
🚀 Let’s build intelligent systems that can learn from your data!