Debugging & Troubleshooting Challenges
Progressive challenges to master ML debugging skills.
Challenge 1: The Mystery Bug ⭐⭐
Difficulty: Beginner
Time: 30-45 minutes
Topic: Basic debugging workflow
Scenario
A junior data scientist wrote code that “works” but gives terrible results. Find and fix the bugs!
Code
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Shuffle (for randomness!)
np.random.shuffle(X_train)
np.random.shuffle(y_train)
# Train
model = LogisticRegression()
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.3f}")Requirements
- Identify all bugs (there are 2)
- Explain why each is a problem
- Fix the code
- Document expected vs actual behavior
- Verify fix with multiple random seeds
Success Criteria
- ✅ Bugs identified correctly
- ✅ Clear explanation of issues
- ✅ Fixed code achieves >85% accuracy
- ✅ Documented properly
Learning Objectives
- Recognize data leakage patterns
- Understand feature-label alignment
- Apply debugging workflow
Challenge 2: Data Detective ⭐⭐⭐
Difficulty: Intermediate
Time: 1-2 hours
Topic: Data quality issues
Scenario
Your model performs well in training but fails in production. Investigate the dataset!
Dataset
Download: UCI Adult Income dataset
Requirements
-
Missing Value Analysis
- Find all columns with missing data
- Calculate missing percentage
- Recommend handling strategy per column
- Implement and compare 2 strategies
-
Outlier Detection
- Apply Z-score method
- Apply IQR method
- Compare results
- Visualize outliers
- Decide on handling approach
-
Duplicate Detection
- Find exact duplicates
- Find near-duplicates (optional)
- Analyze impact on model
-
Distribution Shift
- Split data by time/group
- Test for distribution shift
- Quantify the shift
Deliverables
- Jupyter notebook with full analysis
- Visualizations for each issue type
- Before/after performance comparison
- Summary report with recommendations
Success Criteria
- ✅ All data issues identified
- ✅ Multiple detection methods used
- ✅ Clear visualizations
- ✅ Quantified improvements
Optional Stretch (⭐)
- Detect label noise
- Create automated data quality report
- Build data validation pipeline
Challenge 3: Speed Demon ⭐⭐⭐
Difficulty: Intermediate
Time: 2-3 hours
Topic: Performance optimization
Scenario
Your ML pipeline is too slow for production. Optimize it!
Code (Slow Pipeline)
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# Load data (1M rows)
data = pd.DataFrame(np.random.randn(1000000, 50))
target = np.random.randint(0, 2, 1000000)
# Slow preprocessing
processed = []
for idx, row in data.iterrows():
normalized = (row - row.mean()) / row.std()
processed.append(normalized)
data_processed = pd.DataFrame(processed)
# Slow predictions
model = RandomForestClassifier(n_estimators=100)
model.fit(data_processed[:800000], target[:800000])
predictions = []
for i in range(800000, 1000000):
pred = model.predict(data_processed.iloc[i:i+1])[0]
predictions.append(pred)Requirements
-
Profile the code
- Use cProfile
- Identify top 3 bottlenecks
- Calculate time percentage for each
-
Optimize
- Vectorize preprocessing
- Batch predictions
- Use parallel processing
- Cache where applicable
-
Benchmark
- Measure before/after speed
- Measure memory usage
- Create comparison table
- Verify results match
Success Criteria
- ✅ Minimum 10x speedup
- ✅ Memory usage reduced
- ✅ Results identical to original
- ✅ Code is readable and documented
Optional Stretch (⭐)
- Achieve 50x+ speedup
- Use line_profiler
- Create performance visualization
- Optimize memory further
Challenge 4: Convergence Crisis ⭐⭐⭐⭐
Difficulty: Advanced
Time: 2-3 hours
Topic: Model debugging
Scenario
Your neural network won’t converge. Debug and fix it!
Code
import numpy as np
from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier
X, y = make_classification(
n_samples=10000, n_features=100,
n_informative=50, random_state=42
)
# Don't scale features
model = MLPClassifier(
hidden_layer_sizes=(100, 50),
learning_rate_init=1.0, # High learning rate
max_iter=10, # Too few iterations
random_state=42,
verbose=True
)
model.fit(X, y)
print(f"Score: {model.score(X, y):.3f}")Requirements
-
Diagnose Issues
- Identify all problems
- Explain impact of each
- Prioritize fixes
-
Fix Systematically
- Scale features
- Tune learning rate
- Adjust iterations
- Monitor convergence
-
Learning Curves
- Plot training vs validation loss
- Diagnose overfitting/underfitting
- Apply regularization if needed
-
Hyperparameter Tuning
- Try different architectures
- Test learning rates: [0.001, 0.01, 0.1]
- Create validation curves
Success Criteria
- ✅ Model converges properly
- ✅ No convergence warnings
- ✅ Test accuracy >85%
- ✅ Learning curves look healthy
Optional Stretch (⭐)
- Implement early stopping
- Use GridSearchCV
- Compare with other models
- Analyze gradient flow
Challenge 5: Error Analyzer ⭐⭐⭐⭐
Difficulty: Advanced
Time: 3-4 hours
Topic: Error analysis
Scenario
Your model has 90% accuracy but fails on critical cases. Analyze and improve!
Dataset
Use MNIST digits or similar multi-class dataset
Requirements
-
Confusion Matrix Analysis
- Generate confusion matrix
- Identify top 5 confused pairs
- Visualize normalized matrix
- Explain patterns
-
Per-Class Performance
- Calculate precision/recall/F1 per class
- Identify worst 3 classes
- Analyze why they perform poorly
- Propose class-specific fixes
-
Failure Case Analysis
- Collect 20+ failure examples
- Categorize error types
- Visualize failure cases
- Find common patterns
-
Confidence Analysis
- Plot confidence distribution
- Separate correct/incorrect
- Find high-confidence errors
- Create calibration curve
-
Action Plan
- Prioritize improvements
- Estimate impact
- Propose data collection strategy
- Suggest model changes
Deliverables
- Complete error analysis report
- Visualizations for all analyses
- Categorized failure cases
- Detailed improvement roadmap
Success Criteria
- ✅ Comprehensive confusion matrix analysis
- ✅ All classes analyzed
- ✅ Failure patterns identified
- ✅ Actionable recommendations
Optional Stretch (⭐)
- Implement one improvement
- Show before/after comparison
- Create error monitoring dashboard
Challenge 6: The Production Mystery ⭐⭐⭐⭐⭐
Difficulty: Expert
Time: 4-6 hours
Topic: Real-world debugging
Scenario
Your model works perfectly in development but fails in production. Why?
Given Information
- Training accuracy: 95%
- Test accuracy: 94%
- Production accuracy: 65% (after 1 month)
- No code changes were made
- Different data source in production
Requirements
-
Hypothesis Generation
- List 5+ possible causes
- Rank by likelihood
- Plan investigation steps
-
Distribution Shift Analysis
- Compare train vs production distributions
- Statistical tests (K-S, chi-square)
- Visualize differences
- Quantify shift magnitude
-
Feature Drift Detection
- Monitor feature statistics
- Detect out-of-range values
- Identify concept drift
- Track label distribution
-
Root Cause Analysis
- Investigate data pipeline
- Check preprocessing consistency
- Validate assumptions
- Document findings
-
Solutions
- Propose fixes
- Implement monitoring
- Create retraining strategy
- Build alerting system
Deliverables
- Investigation report
- Distribution analysis
- Monitoring dashboard design
- Retraining pipeline proposal
- Documentation for ops team
Success Criteria
- ✅ Root cause identified
- ✅ Distribution shift quantified
- ✅ Solution implemented
- ✅ Monitoring in place
- ✅ Future prevention strategy
Optional Stretch (⭐⭐)
- Implement automated retraining
- Create A/B testing framework
- Build model versioning system
- Deploy monitoring dashboard
Challenge 7: Debug the Debugger ⭐⭐⭐⭐⭐
Difficulty: Expert
Time: 5-8 hours
Topic: Comprehensive debugging
Scenario
Build a comprehensive debugging toolkit for ML pipelines!
Requirements
-
Automated Bug Detection
class MLPipelineDebugger: def check_data_leakage(self, pipeline): # Detect common leakage patterns pass def check_feature_target_alignment(self, X, y): # Verify alignment pass def check_scaling_issues(self, scaler, X_train, X_test): # Detect scaling problems pass def check_class_imbalance(self, y): # Identify severe imbalance pass def check_convergence(self, model): # Verify model converged pass -
Performance Profiler
- CPU time tracking
- Memory usage monitoring
- Bottleneck identification
- Optimization suggestions
-
Model Health Checker
- Overfitting detection
- Underfitting detection
- Learning curve analysis
- Validation curve generation
-
Error Analyzer
- Automatic confusion matrix
- Per-class metrics
- Failure case collection
- Confidence analysis
-
Report Generator
- Create HTML report
- Include all analyses
- Actionable recommendations
- Export to PDF
Deliverables
ml_debugger.py- Complete toolkit- Test suite with 10+ test cases
- Documentation with examples
- Sample reports (HTML/PDF)
- Tutorial notebook
Success Criteria
- ✅ All checkers implemented
- ✅ Catches common bugs
- ✅ Works with sklearn models
- ✅ Generates useful reports
- ✅ Well documented
Optional Stretch (⭐⭐⭐)
- Support PyTorch/TensorFlow
- Add interactive visualizations
- Create CLI tool
- Publish as package
- Add CI/CD integration
🏆 Completion Tracker
Track your progress:
- Challenge 1: The Mystery Bug ⭐⭐
- Challenge 2: Data Detective ⭐⭐⭐
- Challenge 3: Speed Demon ⭐⭐⭐
- Challenge 4: Convergence Crisis ⭐⭐⭐⭐
- Challenge 5: Error Analyzer ⭐⭐⭐⭐
- Challenge 6: The Production Mystery ⭐⭐⭐⭐⭐
- Challenge 7: Debug the Debugger ⭐⭐⭐⭐⭐
💡 Tips
- Start Simple: Begin with Challenge 1 and progress
- Document Everything: Keep notes on what you tried
- Measure First: Always establish baseline before optimizing
- Test Thoroughly: Verify fixes work with different data
- Learn from Failures: Every bug teaches something valuable
📚 Resources
- Python Debugging Guide
- Scikit-learn Common Pitfalls
- A Recipe for Training Neural Networks - Andrej Karpathy
Happy Debugging! 🐛→🦋