Skip to Content
08 RAG17 Assignment

Assignment: Build a Production-Ready RAG System

🎯 Objective

Build a complete Retrieval-Augmented Generation (RAG) system for a real-world use case. Your system should handle document ingestion, intelligent retrieval, and high-quality answer generation with proper evaluation metrics.

Estimated Time: 8-10 hours
Difficulty: ⭐⭐⭐⭐ Advanced
Suggested Pace: 1-2 weeks after completing the main RAG notebooks


📋 Requirements

Part 1: Document Processing Pipeline

Build a robust document ingestion system:

  • Multi-format support: PDF, DOCX, TXT, Markdown, HTML
  • Intelligent chunking:
    • Semantic chunking (keep related content together)
    • Overlapping chunks for context preservation
    • Metadata extraction (title, author, date, section)
  • Text cleaning: Remove headers, footers, page numbers
  • Deduplication: Detect and remove duplicate chunks
class DocumentProcessor: def __init__(self, chunk_size=512, chunk_overlap=50): self.chunk_size = chunk_size self.chunk_overlap = chunk_overlap def process_document(self, file_path): """Process document and return structured chunks.""" # TODO: Implement pass def chunk_text(self, text, metadata=None): """Split text into semantic chunks.""" # TODO: Implement semantic chunking pass def extract_metadata(self, document): """Extract document metadata.""" # TODO: Implement pass

Part 2: Vector Database & Retrieval

Implement advanced retrieval strategies:

  • Vector store setup: Use ChromaDB, Pinecone, or Weaviate
  • Embedding generation: Use OpenAI embeddings or open-source alternatives
  • Hybrid search: Combine dense (semantic) + sparse (keyword) retrieval
  • Re-ranking: Implement cross-encoder re-ranking for top results
  • Metadata filtering: Filter by date, author, document type
  • One modern retrieval upgrade: Implement at least one of HyDE, query decomposition, contextual compression, parent-child retrieval, or relevant-segment extraction
class RAGRetriever: def __init__(self, vector_store, embedding_model): self.vector_store = vector_store self.embedding_model = embedding_model self.reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2') def hybrid_search(self, query, top_k=10, alpha=0.5): """ Combine semantic and keyword search. Args: query: User question top_k: Number of results alpha: Weight for semantic vs keyword (0-1) """ # TODO: Implement hybrid search pass def rerank_results(self, query, candidates): """Re-rank retrieved documents using cross-encoder.""" # TODO: Implement re-ranking pass def filter_by_metadata(self, results, filters): """Apply metadata filters to results.""" # TODO: Implement filtering pass def advanced_retrieve(self, query): """Apply one modern retrieval upgrade beyond baseline hybrid search.""" # Example options: # - HyDE or query rewriting before retrieval # - contextual compression after retrieval # - parent-child or hierarchical retrieval for long documents pass

Part 3: Answer Generation

Create an intelligent answer generator:

  • Context assembly: Select and order most relevant chunks
  • Prompt engineering: Design effective RAG prompts
  • Citation tracking: Include source references in answers
  • Confidence scoring: Estimate answer confidence
  • Fallback handling: Graceful handling when no good answer exists
  • Abstention policy: Refuse to answer when evidence is weak or conflicting
class AnswerGenerator: def __init__(self, llm_client, model="gpt-4-turbo"): self.client = llm_client self.model = model def generate_answer(self, query, context_chunks, include_citations=True): """ Generate answer from retrieved context. Returns: { "answer": str, "citations": List[dict], "confidence": float, "sources_used": List[str] } """ # TODO: Implement answer generation pass def build_prompt(self, query, context): """Build RAG prompt with query and context.""" prompt = f"""Answer the question based on the context below. If the answer is not in the context, say "I don't have enough information." Context: {context} Question: {query} Answer with citations:""" return prompt def estimate_confidence(self, answer, context): """Estimate answer confidence (0-1).""" # TODO: Implement confidence scoring pass

Part 4: Evaluation & Testing

Comprehensively evaluate your RAG system:

  • Create test dataset: 50+ questions with ground truth answers
  • Retrieval metrics:
    • Precision@K, Recall@K
    • Mean Reciprocal Rank (MRR)
    • Normalized Discounted Cumulative Gain (NDCG)
  • Generation metrics:
    • ROUGE scores
    • BERTScore
    • Semantic similarity
    • Faithfulness (no hallucinations)
  • End-to-end metrics:
    • Answer accuracy
    • Latency (< 3 seconds)
    • Cost per query
  • Failure analysis: Compare at least 3 failure modes (bad chunking, weak retrieval, unsupported answer)
  • Ablation study: Measure baseline RAG vs. one advanced technique you added
class RAGEvaluator: def __init__(self, rag_system): self.rag_system = rag_system def evaluate_retrieval(self, test_set): """ Evaluate retrieval quality. Returns metrics: precision, recall, MRR, NDCG """ pass def evaluate_generation(self, test_set): """ Evaluate answer quality. Returns metrics: ROUGE, BERTScore, faithfulness """ pass def evaluate_end_to_end(self, test_set): """ Full system evaluation. Returns: accuracy, latency, cost """ pass def create_evaluation_report(self): """Generate comprehensive evaluation report.""" pass def compare_variants(self, test_set): """Compare baseline retrieval against your improved retrieval pipeline.""" pass

📊 Self-Review Guide

CriteriaStrongWorkingEmergingNeeds revision
Document ProcessingMulti-format support, semantic chunking, metadataGood chunking with basic metadataSimple chunking onlyBroken or incomplete
RetrievalHybrid plus reranking plus one modern upgrade with clear gainsGood retrieval with some rerankingBasic semantic searchPoor retrieval quality
GenerationCitations, confidence, abstention, high-quality answersGood answers with some citationsBasic answers generatedPoor answer quality
EvaluationComprehensive metrics, 50+ test cases, ablation and failure analysisGood metrics with 30-50 test casesBasic evaluation with 20-30 test casesIncomplete evaluation

🎯 Use Case Options (Choose One)

Option 1: Technical Documentation Assistant

Dataset: Python/React/AWS documentation
Challenge: Handle code examples, API references
Special requirement: Syntax highlighting in answers

Option 2: Research Paper Q&A

Dataset: ArXiv papers in your field of interest
Challenge: Mathematical notation, citations
Special requirement: LaTeX rendering

Option 3: Company Knowledge Base

Dataset: Internal docs, wikis, Slack conversations
Challenge: Privacy, access control
Special requirement: User permissions

Dataset: Court cases, statutes, regulations
Challenge: Precise language, citations critical
Special requirement: Confidence scoring

Dataset: PubMed articles, clinical trials
Challenge: Technical terminology, accuracy critical
Special requirement: Source verification


🌟 Optional Extensions

Optional Extension 1: Conversational RAG

  • Multi-turn conversations with context
  • Follow-up question handling
  • Conversation memory management
  • Context window optimization

Optional Extension 2: Advanced Retrieval

  • Query expansion with LLMs
  • Multi-query retrieval
  • Parent document retrieval
  • Hypothetical Document Embeddings (HyDE)

Optional Extension 5: Reliability Loop

  • Implement CRAG-style retrieval correction or answer verification
  • Detect low-confidence retrieval and retry with a better query
  • Route unsupported questions to abstain / follow-up instead of hallucinating

Optional Extension 6: Advanced Architecture

  • RAPTOR, hierarchical retrieval, or GraphRAG prototype
  • Explain why the architecture helps your dataset
  • Compare it against your flat-chunk baseline

🧠 Suggested Technique Choices

If you are unsure what to add beyond baseline RAG, pick one of these:

  1. Best first upgrade: Hybrid retrieval + reranking.
  2. Best for ambiguous questions: Query rewriting or HyDE.
  3. Best for long structured documents: Parent-child retrieval or RAPTOR-style summarization.
  4. Best for noisy corpora: Contextual compression or relevant-segment extraction.
  5. Best for high-stakes settings: Evidence verification, CRAG-style correction, or abstention logic.

Optional Extension 3: Deployment

  • FastAPI backend
  • Gradio/Streamlit frontend
  • Docker containerization
  • Deploy to cloud (Hugging Face Spaces/Railway)

Optional Extension 4: Monitoring & Analytics

  • Query analytics dashboard
  • User feedback collection
  • A/B testing framework
  • Cost tracking per user/query

📦 Deliverables

Repository Structure

your-name-rag-system/ ├── README.md # Setup and usage guide ├── requirements.txt # Dependencies ├── .env.example # Environment variables template ├── src/ │ ├── document_processor.py # Part 1 │ ├── retriever.py # Part 2 │ ├── generator.py # Part 3 │ ├── evaluator.py # Part 4 │ └── rag_system.py # Main system ├── data/ │ ├── documents/ # Source documents │ ├── test_set.json # Evaluation questions │ └── ground_truth.json # Expected answers ├── notebooks/ │ ├── 01_data_preparation.ipynb │ ├── 02_retrieval_experiments.ipynb │ ├── 03_generation_tuning.ipynb │ └── 04_evaluation_analysis.ipynb ├── tests/ │ ├── test_processor.py │ ├── test_retriever.py │ ├── test_generator.py │ └── test_integration.py ├── results/ │ ├── metrics.json │ ├── error_analysis.md │ └── charts/ └── EVALUATION_REPORT.md # Detailed analysis

Deliverables

  1. Working RAG System:

    • All 4 parts implemented
    • Passes all tests
    • CLI or API interface
    • Demo notebook
  2. Evaluation Report:

    • Methodology description
    • Metrics tables and charts
    • Error analysis
    • Optimization attempts
    • Conclusions
  3. Test Dataset:

    • 50+ diverse questions
    • Ground truth answers
    • Difficulty levels
    • Coverage of edge cases
  4. Demo:

    • 5-minute video OR
    • Live Gradio/Streamlit app
    • Show: ingestion → retrieval → generation → evaluation

💡 Advanced Tips

Tip 1: Semantic Chunking Strategy

def semantic_chunking(text, max_chunk_size=512): """ Chunk text at semantic boundaries. Priority: paragraphs > sentences > words """ # Try paragraph-level first paragraphs = text.split('\n\n') chunks = [] current_chunk = "" for para in paragraphs: if len(current_chunk) + len(para) &lt; max_chunk_size: current_chunk += para + "\n\n" else: if current_chunk: chunks.append(current_chunk.strip()) current_chunk = para + "\n\n" if current_chunk: chunks.append(current_chunk.strip()) return chunks

Tip 2: Hybrid Search Implementation

def hybrid_search(self, query, top_k=10, alpha=0.7): """ Combine dense (embeddings) + sparse (BM25) search. alpha: weight for dense search (1-alpha for sparse) """ # Dense retrieval query_embedding = self.embed(query) dense_results = self.vector_store.similarity_search( query_embedding, k=top_k*2 ) # Sparse retrieval (BM25) sparse_results = self.bm25.get_top_n(query, self.documents, n=top_k*2) # Combine with weighted scoring combined_scores = {} for doc, score in dense_results: combined_scores[doc.id] = alpha * score for doc, score in sparse_results: combined_scores[doc.id] = ( combined_scores.get(doc.id, 0) + (1-alpha) * score ) # Sort and return top k sorted_docs = sorted( combined_scores.items(), key=lambda x: x[1], reverse=True )[:top_k] return [self.get_doc(doc_id) for doc_id, _ in sorted_docs]

Tip 3: Citation Extraction

def generate_with_citations(self, query, chunks): """Generate answer with inline citations.""" # Number each chunk context = "\n\n".join([ f"[{i+1}] {chunk.text}" for i, chunk in enumerate(chunks) ]) prompt = f"""Answer using the numbered sources below. Include citations like [1], [2] in your answer. {context} Question: {query} Answer:""" answer = self.llm(prompt) # Extract citations and map to sources import re citations = re.findall(r'\[(\d+)\]', answer) sources = [chunks[int(c)-1].metadata for c in set(citations)] return { "answer": answer, "sources": sources, "citation_count": len(set(citations)) }

📚 Resources

Essential Reading

Tools & Libraries

  • Vector Stores: ChromaDB, Pinecone, Weaviate, Qdrant
  • Embeddings: OpenAI, Cohere, Sentence-Transformers
  • Frameworks: LangChain, LlamaIndex, Haystack
  • Evaluation: RAGAS, DeepEval

Papers


🎓 Learning Objectives

After completing this assignment, you will:

  • ✅ Build end-to-end RAG systems
  • ✅ Implement advanced retrieval techniques
  • ✅ Optimize for quality, speed, and cost
  • ✅ Evaluate RAG systems comprehensively
  • ✅ Deploy production-ready AI applications

💬 Support

  • Discussion: GitHub Discussions 
  • Best help request: Include your retrieval setup, evaluation method, and one example failure case.
  • Recommended order: Finish the core RAG notebooks before turning this into a larger build.

Good luck building your RAG system! 🚀

Last updated on