Assignment: Debug & Optimize a Broken ML Pipeline
Scope: capstone-style debugging project with optional extensions
Duration: 4-6 hours
Difficulty: ⭐⭐⭐⭐
📋 Objective
You’ve been given a broken ML pipeline that has multiple issues affecting performance, speed, and reliability. Your task is to systematically debug, optimize, and document all improvements.
🎯 Learning Goals
- Apply systematic debugging workflows
- Diagnose data quality issues
- Profile and optimize code performance
- Fix model convergence problems
- Conduct comprehensive error analysis
📦 Dataset
Choose ONE of the following:
- UCI Adult Income (Classification)
- California Housing (Regression)
- MNIST Digits (Multi-class Classification)
- Custom dataset (recommended only if you can document the problem, target, and debugging constraints clearly)
Requirements:
- Minimum 5,000 samples
- At least 10 features
- Real-world dataset (not synthetic)
🛠️ Part 1: Initial Assessment
Tasks:
- Run the provided buggy code (see below)
- Document all issues you observe
- Create a baseline report with:
- Current accuracy/performance
- Execution time
- Memory usage
- List of identified problems
Deliverables:
01_initial_assessment.md- Problem documentation- Screenshots of errors/warnings
- Baseline performance metrics
Self-check:
- Comprehensive problem list
- Baseline metrics documented
- Clear documentation
🐛 Part 2: Data Quality Debugging
Tasks:
-
Missing values analysis
- Identify columns with missing data
- Recommend handling strategy
- Implement solution
-
Duplicate detection
- Find and remove duplicates
- Document impact
-
Outlier analysis
- Use 2+ detection methods
- Visualize outliers
- Decide on handling strategy
-
Distribution shift check
- Compare train/test distributions
- Statistical tests (K-S test)
- Document findings
Deliverables:
02_data_quality_report.ipynb- Visualizations of data issues
- Before/after comparison
Self-check:
- Missing value handling
- Outlier detection and handling
- Distribution analysis
- Visualization quality
- Documentation
⚡ Part 3: Performance Profiling & Optimization
Tasks:
-
CPU Profiling
- Use cProfile to identify hotspots
- Document top 5 slowest functions
- Calculate time percentages
-
Memory Profiling
- Track memory usage
- Identify memory leaks
- Measure peak memory
-
Optimization
- Apply vectorization where possible
- Implement batch processing
- Use parallelization (n_jobs=-1)
- Cache repeated computations
-
Benchmarking
- Before/after timing comparison
- Document speedup factor
- Memory reduction percentage
Deliverables:
03_profiling_report.ipynb- Profiling output files
- Optimization code with comments
- Performance comparison table
Self-check:
- Comprehensive profiling
- Multiple optimization techniques
- Quantified improvements
- Code quality and documentation
🔧 Part 4: Model Debugging
Tasks:
-
Learning Curves
- Plot training vs validation curves
- Diagnose overfitting/underfitting
- Determine if more data would help
-
Convergence Analysis
- Check for convergence warnings
- Scale features properly
- Tune learning rate
- Verify convergence
-
Regularization
- Try L1 (Lasso) and L2 (Ridge)
- Create validation curves
- Find optimal alpha
-
Model Comparison
- Test 3+ models
- Compare complexity vs performance
- Justify final model choice
Deliverables:
04_model_debugging.ipynb- Learning curve plots
- Validation curves
- Model comparison table
Self-check:
- Learning curve analysis
- Convergence fixes
- Regularization experiments
- Model selection justification
📊 Part 5: Error Analysis
Tasks:
-
Confusion Matrix Analysis
- Generate confusion matrix
- Identify most confused pairs
- Normalize by class
-
Per-Class Performance
- Calculate precision, recall, F1 per class
- Identify worst-performing classes
- Visualize performance distribution
-
Failure Case Analysis
- Collect 10+ failure examples
- Categorize error types
- Analyze patterns
-
Confidence Calibration
- Plot confidence distribution
- Separate correct/incorrect predictions
- Create calibration curve
-
Improvement Recommendations
- List top 3 priority improvements
- Justify with data
- Estimate expected impact
Deliverables:
05_error_analysis.ipynb- Comprehensive error report
- Failure case visualizations
- Improvement roadmap
Self-check:
- Confusion matrix insights
- Per-class analysis
- Failure case categorization
- Actionable recommendations
📝 Buggy Code Template
# BUGGY ML PIPELINE - FIX ALL ISSUES!
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load data
data = pd.read_csv('your_dataset.csv')
# Bug 1: Not handling missing values
X = data.drop('target', axis=1)
y = data['target']
# Bug 2: Wrong test size
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.9 # Should be 0.2!
)
# Bug 3: Shuffling features and labels independently
np.random.shuffle(X_train.values)
np.random.shuffle(y_train.values)
# Bug 4: Scaling on test data
scaler = StandardScaler()
scaler.fit(X_test) # Should fit on train!
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
# Bug 5: Row-wise normalization (slow!)
X_train_normalized = []
for row in X_train:
X_train_normalized.append((row - row.mean()) / row.std())
X_train = np.array(X_train_normalized)
# Bug 6: Wrong labels for training
model = LogisticRegression(max_iter=10) # Bug 7: Too few iterations
model.fit(X_train, y_test) # Should be y_train!
# Bug 8: Evaluating on training data
accuracy = model.score(X_train, y_train)
print(f"Accuracy: {accuracy:.3f}")
# Bug 9: No error handling, logging, or validation
# Bug 10: Not checking for convergenceYour task: Fix ALL bugs and add proper debugging/logging!
🎁 Optional Extensions
Optional Extension 1: Advanced Profiling
- Use line_profiler for line-by-line analysis
- Use memory_profiler with decorators
- Create flame graphs or profiling visualizations
Optional Extension 2: Automated Bug Detection
- Write a function that automatically detects common issues
- Check for: data leakage, scaling problems, shape mismatches
- Create a “pre-flight checklist” tool
Optional Extension 3: A/B Testing
- Compare optimized vs original pipeline
- Run statistical significance tests
- Document confidence intervals
Optional Extension 4: Production Monitoring
- Add comprehensive logging
- Implement error alerts
- Create monitoring dashboard
- Set up performance tracking
📤 Deliverables
File Structure:
debugging_assignment/
├── README.md (Summary of all work)
├── 01_initial_assessment.md
├── 02_data_quality_report.ipynb
├── 03_profiling_report.ipynb
├── 04_model_debugging.ipynb
├── 05_error_analysis.ipynb
├── data/
│ └── dataset.csv
├── figures/
│ ├── confusion_matrix.png
│ ├── learning_curves.png
│ └── ...
└── src/
├── fixed_pipeline.py
└── debugging_utils.pyDocumentation Requirements:
- Clear markdown headers and sections
- Code comments explaining fixes
- Visualizations with titles and labels
- Summary of improvements in README
Code Quality:
- PEP 8 compliance
- Proper error handling
- Logging throughout
- Type hints (bonus)
🎯 Review Summary
| Component | Relative Emphasis |
|---|---|
| Part 1: Initial Assessment | Medium |
| Part 2: Data Quality | High |
| Part 3: Performance Optimization | High |
| Part 4: Model Debugging | High |
| Part 5: Error Analysis | High |
| Optional Extensions | Extra depth |
💡 Tips for Success
-
Work Systematically
- Follow the debugging workflow
- Document each fix
- Test after each change
-
Measure Everything
- Baseline first, then optimize
- Quantify all improvements
- Compare before/after
-
Visualize
- Plots reveal patterns
- Show distributions
- Highlight insights
-
Document Well
- Explain WHY you made changes
- Justify decisions with data
- Write for future you
-
Test Thoroughly
- Verify fixes work
- Check edge cases
- Validate with different seeds
📅 Timeline
Week 1:
- Complete Parts 1-2 (Initial Assessment + Data Quality)
Week 2:
- Complete Part 3 (Performance Optimization)
Week 3:
- Complete Parts 4-5 (Model Debugging + Error Analysis)
Week 4:
- Optional stretch challenges
- Final polish and packaging
🆘 Getting Help
- Discussion Forum: GitHub Discussions
- Example Notebooks: Revisit the phase notebooks and your own failing experiments
- Best help request: Include the bug, the evidence, and what you already ruled out
✅ Final Checklist
Before you call this project complete, verify:
- All 5 required notebooks completed
- All bugs in template code fixed
- Performance improvements quantified
- Comprehensive error analysis
- Clear documentation throughout
- Code runs without errors
- README.md summarizes all work
- Files organized properly
- Visualizations included
- Project is packaged clearly enough that someone else could run and review it
Good luck with your debugging! Remember: Every bug you fix makes you a better ML engineer! 🚀