Assignment: Build a Production-Ready RAG System
🎯 Objective
Build a complete Retrieval-Augmented Generation (RAG) system for a real-world use case. Your system should handle document ingestion, intelligent retrieval, and high-quality answer generation with proper evaluation metrics.
Estimated Time: 8-10 hours
Difficulty: ⭐⭐⭐⭐ Advanced
Suggested Pace: 1-2 weeks after completing the main RAG notebooks
📋 Requirements
Part 1: Document Processing Pipeline
Build a robust document ingestion system:
- Multi-format support: PDF, DOCX, TXT, Markdown, HTML
- Intelligent chunking:
- Semantic chunking (keep related content together)
- Overlapping chunks for context preservation
- Metadata extraction (title, author, date, section)
- Text cleaning: Remove headers, footers, page numbers
- Deduplication: Detect and remove duplicate chunks
class DocumentProcessor:
def __init__(self, chunk_size=512, chunk_overlap=50):
self.chunk_size = chunk_size
self.chunk_overlap = chunk_overlap
def process_document(self, file_path):
"""Process document and return structured chunks."""
# TODO: Implement
pass
def chunk_text(self, text, metadata=None):
"""Split text into semantic chunks."""
# TODO: Implement semantic chunking
pass
def extract_metadata(self, document):
"""Extract document metadata."""
# TODO: Implement
passPart 2: Vector Database & Retrieval
Implement advanced retrieval strategies:
- Vector store setup: Use ChromaDB, Pinecone, or Weaviate
- Embedding generation: Use OpenAI embeddings or open-source alternatives
- Hybrid search: Combine dense (semantic) + sparse (keyword) retrieval
- Re-ranking: Implement cross-encoder re-ranking for top results
- Metadata filtering: Filter by date, author, document type
- One modern retrieval upgrade: Implement at least one of HyDE, query decomposition, contextual compression, parent-child retrieval, or relevant-segment extraction
class RAGRetriever:
def __init__(self, vector_store, embedding_model):
self.vector_store = vector_store
self.embedding_model = embedding_model
self.reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def hybrid_search(self, query, top_k=10, alpha=0.5):
"""
Combine semantic and keyword search.
Args:
query: User question
top_k: Number of results
alpha: Weight for semantic vs keyword (0-1)
"""
# TODO: Implement hybrid search
pass
def rerank_results(self, query, candidates):
"""Re-rank retrieved documents using cross-encoder."""
# TODO: Implement re-ranking
pass
def filter_by_metadata(self, results, filters):
"""Apply metadata filters to results."""
# TODO: Implement filtering
pass
def advanced_retrieve(self, query):
"""Apply one modern retrieval upgrade beyond baseline hybrid search."""
# Example options:
# - HyDE or query rewriting before retrieval
# - contextual compression after retrieval
# - parent-child or hierarchical retrieval for long documents
passPart 3: Answer Generation
Create an intelligent answer generator:
- Context assembly: Select and order most relevant chunks
- Prompt engineering: Design effective RAG prompts
- Citation tracking: Include source references in answers
- Confidence scoring: Estimate answer confidence
- Fallback handling: Graceful handling when no good answer exists
- Abstention policy: Refuse to answer when evidence is weak or conflicting
class AnswerGenerator:
def __init__(self, llm_client, model="gpt-4-turbo"):
self.client = llm_client
self.model = model
def generate_answer(self, query, context_chunks, include_citations=True):
"""
Generate answer from retrieved context.
Returns:
{
"answer": str,
"citations": List[dict],
"confidence": float,
"sources_used": List[str]
}
"""
# TODO: Implement answer generation
pass
def build_prompt(self, query, context):
"""Build RAG prompt with query and context."""
prompt = f"""Answer the question based on the context below.
If the answer is not in the context, say "I don't have enough information."
Context:
{context}
Question: {query}
Answer with citations:"""
return prompt
def estimate_confidence(self, answer, context):
"""Estimate answer confidence (0-1)."""
# TODO: Implement confidence scoring
passPart 4: Evaluation & Testing
Comprehensively evaluate your RAG system:
- Create test dataset: 50+ questions with ground truth answers
- Retrieval metrics:
- Precision@K, Recall@K
- Mean Reciprocal Rank (MRR)
- Normalized Discounted Cumulative Gain (NDCG)
- Generation metrics:
- ROUGE scores
- BERTScore
- Semantic similarity
- Faithfulness (no hallucinations)
- End-to-end metrics:
- Answer accuracy
- Latency (< 3 seconds)
- Cost per query
- Failure analysis: Compare at least 3 failure modes (bad chunking, weak retrieval, unsupported answer)
- Ablation study: Measure baseline RAG vs. one advanced technique you added
class RAGEvaluator:
def __init__(self, rag_system):
self.rag_system = rag_system
def evaluate_retrieval(self, test_set):
"""
Evaluate retrieval quality.
Returns metrics: precision, recall, MRR, NDCG
"""
pass
def evaluate_generation(self, test_set):
"""
Evaluate answer quality.
Returns metrics: ROUGE, BERTScore, faithfulness
"""
pass
def evaluate_end_to_end(self, test_set):
"""
Full system evaluation.
Returns: accuracy, latency, cost
"""
pass
def create_evaluation_report(self):
"""Generate comprehensive evaluation report."""
pass
def compare_variants(self, test_set):
"""Compare baseline retrieval against your improved retrieval pipeline."""
pass📊 Self-Review Guide
| Criteria | Strong | Working | Emerging | Needs revision |
|---|---|---|---|---|
| Document Processing | Multi-format support, semantic chunking, metadata | Good chunking with basic metadata | Simple chunking only | Broken or incomplete |
| Retrieval | Hybrid plus reranking plus one modern upgrade with clear gains | Good retrieval with some reranking | Basic semantic search | Poor retrieval quality |
| Generation | Citations, confidence, abstention, high-quality answers | Good answers with some citations | Basic answers generated | Poor answer quality |
| Evaluation | Comprehensive metrics, 50+ test cases, ablation and failure analysis | Good metrics with 30-50 test cases | Basic evaluation with 20-30 test cases | Incomplete evaluation |
🎯 Use Case Options (Choose One)
Option 1: Technical Documentation Assistant
Dataset: Python/React/AWS documentation
Challenge: Handle code examples, API references
Special requirement: Syntax highlighting in answers
Option 2: Research Paper Q&A
Dataset: ArXiv papers in your field of interest
Challenge: Mathematical notation, citations
Special requirement: LaTeX rendering
Option 3: Company Knowledge Base
Dataset: Internal docs, wikis, Slack conversations
Challenge: Privacy, access control
Special requirement: User permissions
Option 4: Legal Document Analysis
Dataset: Court cases, statutes, regulations
Challenge: Precise language, citations critical
Special requirement: Confidence scoring
Option 5: Medical Literature Search
Dataset: PubMed articles, clinical trials
Challenge: Technical terminology, accuracy critical
Special requirement: Source verification
🌟 Optional Extensions
Optional Extension 1: Conversational RAG
- Multi-turn conversations with context
- Follow-up question handling
- Conversation memory management
- Context window optimization
Optional Extension 2: Advanced Retrieval
- Query expansion with LLMs
- Multi-query retrieval
- Parent document retrieval
- Hypothetical Document Embeddings (HyDE)
Optional Extension 5: Reliability Loop
- Implement CRAG-style retrieval correction or answer verification
- Detect low-confidence retrieval and retry with a better query
- Route unsupported questions to abstain / follow-up instead of hallucinating
Optional Extension 6: Advanced Architecture
- RAPTOR, hierarchical retrieval, or GraphRAG prototype
- Explain why the architecture helps your dataset
- Compare it against your flat-chunk baseline
🧠 Suggested Technique Choices
If you are unsure what to add beyond baseline RAG, pick one of these:
- Best first upgrade: Hybrid retrieval + reranking.
- Best for ambiguous questions: Query rewriting or HyDE.
- Best for long structured documents: Parent-child retrieval or RAPTOR-style summarization.
- Best for noisy corpora: Contextual compression or relevant-segment extraction.
- Best for high-stakes settings: Evidence verification, CRAG-style correction, or abstention logic.
Optional Extension 3: Deployment
- FastAPI backend
- Gradio/Streamlit frontend
- Docker containerization
- Deploy to cloud (Hugging Face Spaces/Railway)
Optional Extension 4: Monitoring & Analytics
- Query analytics dashboard
- User feedback collection
- A/B testing framework
- Cost tracking per user/query
📦 Deliverables
Repository Structure
your-name-rag-system/
├── README.md # Setup and usage guide
├── requirements.txt # Dependencies
├── .env.example # Environment variables template
├── src/
│ ├── document_processor.py # Part 1
│ ├── retriever.py # Part 2
│ ├── generator.py # Part 3
│ ├── evaluator.py # Part 4
│ └── rag_system.py # Main system
├── data/
│ ├── documents/ # Source documents
│ ├── test_set.json # Evaluation questions
│ └── ground_truth.json # Expected answers
├── notebooks/
│ ├── 01_data_preparation.ipynb
│ ├── 02_retrieval_experiments.ipynb
│ ├── 03_generation_tuning.ipynb
│ └── 04_evaluation_analysis.ipynb
├── tests/
│ ├── test_processor.py
│ ├── test_retriever.py
│ ├── test_generator.py
│ └── test_integration.py
├── results/
│ ├── metrics.json
│ ├── error_analysis.md
│ └── charts/
└── EVALUATION_REPORT.md # Detailed analysisDeliverables
-
Working RAG System:
- All 4 parts implemented
- Passes all tests
- CLI or API interface
- Demo notebook
-
Evaluation Report:
- Methodology description
- Metrics tables and charts
- Error analysis
- Optimization attempts
- Conclusions
-
Test Dataset:
- 50+ diverse questions
- Ground truth answers
- Difficulty levels
- Coverage of edge cases
-
Demo:
- 5-minute video OR
- Live Gradio/Streamlit app
- Show: ingestion → retrieval → generation → evaluation
💡 Advanced Tips
Tip 1: Semantic Chunking Strategy
def semantic_chunking(text, max_chunk_size=512):
"""
Chunk text at semantic boundaries.
Priority: paragraphs > sentences > words
"""
# Try paragraph-level first
paragraphs = text.split('\n\n')
chunks = []
current_chunk = ""
for para in paragraphs:
if len(current_chunk) + len(para) < max_chunk_size:
current_chunk += para + "\n\n"
else:
if current_chunk:
chunks.append(current_chunk.strip())
current_chunk = para + "\n\n"
if current_chunk:
chunks.append(current_chunk.strip())
return chunksTip 2: Hybrid Search Implementation
def hybrid_search(self, query, top_k=10, alpha=0.7):
"""
Combine dense (embeddings) + sparse (BM25) search.
alpha: weight for dense search (1-alpha for sparse)
"""
# Dense retrieval
query_embedding = self.embed(query)
dense_results = self.vector_store.similarity_search(
query_embedding, k=top_k*2
)
# Sparse retrieval (BM25)
sparse_results = self.bm25.get_top_n(query, self.documents, n=top_k*2)
# Combine with weighted scoring
combined_scores = {}
for doc, score in dense_results:
combined_scores[doc.id] = alpha * score
for doc, score in sparse_results:
combined_scores[doc.id] = (
combined_scores.get(doc.id, 0) + (1-alpha) * score
)
# Sort and return top k
sorted_docs = sorted(
combined_scores.items(),
key=lambda x: x[1],
reverse=True
)[:top_k]
return [self.get_doc(doc_id) for doc_id, _ in sorted_docs]Tip 3: Citation Extraction
def generate_with_citations(self, query, chunks):
"""Generate answer with inline citations."""
# Number each chunk
context = "\n\n".join([
f"[{i+1}] {chunk.text}"
for i, chunk in enumerate(chunks)
])
prompt = f"""Answer using the numbered sources below.
Include citations like [1], [2] in your answer.
{context}
Question: {query}
Answer:"""
answer = self.llm(prompt)
# Extract citations and map to sources
import re
citations = re.findall(r'\[(\d+)\]', answer)
sources = [chunks[int(c)-1].metadata for c in set(citations)]
return {
"answer": answer,
"sources": sources,
"citation_count": len(set(citations))
}📚 Resources
Essential Reading
Tools & Libraries
- Vector Stores: ChromaDB, Pinecone, Weaviate, Qdrant
- Embeddings: OpenAI, Cohere, Sentence-Transformers
- Frameworks: LangChain, LlamaIndex, Haystack
- Evaluation: RAGAS, DeepEval
Papers
🎓 Learning Objectives
After completing this assignment, you will:
- ✅ Build end-to-end RAG systems
- ✅ Implement advanced retrieval techniques
- ✅ Optimize for quality, speed, and cost
- ✅ Evaluate RAG systems comprehensively
- ✅ Deploy production-ready AI applications
💬 Support
- Discussion: GitHub Discussions
- Best help request: Include your retrieval setup, evaluation method, and one example failure case.
- Recommended order: Finish the core RAG notebooks before turning this into a larger build.
Good luck building your RAG system! 🚀