Debugging & Troubleshooting Challenges

Progressive challenges to master ML debugging skills.

Challenge 1: The Mystery Bug ⭐⭐

Difficulty: Beginner
Time: 30-45 minutes
Topic: Basic debugging workflow

Scenario

A junior data scientist wrote code that “works” but gives terrible results. Find and fix the bugs!

Code


import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
 
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
 
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
 
# Shuffle (for randomness!)
np.random.shuffle(X_train)
np.random.shuffle(y_train)
 
# Train
model = LogisticRegression()
model.fit(X_train, y_train)
 
print(f"Accuracy: {model.score(X_test, y_test):.3f}")

Requirements

Identify all bugs (there are 2)
Explain why each is a problem
Fix the code
Document expected vs actual behavior
Verify fix with multiple random seeds

Success Criteria

✅ Bugs identified correctly
✅ Clear explanation of issues
✅ Fixed code achieves >85% accuracy
✅ Documented properly

Learning Objectives

Recognize data leakage patterns
Understand feature-label alignment
Apply debugging workflow

Challenge 2: Data Detective ⭐⭐⭐

Difficulty: Intermediate
Time: 1-2 hours
Topic: Data quality issues

Scenario

Your model performs well in training but fails in production. Investigate the dataset!

Dataset

Download: UCI Adult Income dataset

Requirements

Missing Value Analysis
- Find all columns with missing data
- Calculate missing percentage
- Recommend handling strategy per column
- Implement and compare 2 strategies
Outlier Detection
- Apply Z-score method
- Apply IQR method
- Compare results
- Visualize outliers
- Decide on handling approach
Duplicate Detection
- Find exact duplicates
- Find near-duplicates (optional)
- Analyze impact on model
Distribution Shift
- Split data by time/group
- Test for distribution shift
- Quantify the shift

Deliverables

Jupyter notebook with full analysis
Visualizations for each issue type
Before/after performance comparison
Summary report with recommendations

Success Criteria

✅ All data issues identified
✅ Multiple detection methods used
✅ Clear visualizations
✅ Quantified improvements

Optional Stretch (⭐)

Detect label noise
Create automated data quality report
Build data validation pipeline

Challenge 3: Speed Demon ⭐⭐⭐

Difficulty: Intermediate
Time: 2-3 hours
Topic: Performance optimization

Scenario

Your ML pipeline is too slow for production. Optimize it!

Code (Slow Pipeline)


import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
 
# Load data (1M rows)
data = pd.DataFrame(np.random.randn(1000000, 50))
target = np.random.randint(0, 2, 1000000)
 
# Slow preprocessing
processed = []
for idx, row in data.iterrows():
    normalized = (row - row.mean()) / row.std()
    processed.append(normalized)
data_processed = pd.DataFrame(processed)
 
# Slow predictions
model = RandomForestClassifier(n_estimators=100)
model.fit(data_processed[:800000], target[:800000])
 
predictions = []
for i in range(800000, 1000000):
    pred = model.predict(data_processed.iloc[i:i+1])[0]
    predictions.append(pred)

Requirements

Profile the code
- Use cProfile
- Identify top 3 bottlenecks
- Calculate time percentage for each
Optimize
- Vectorize preprocessing
- Batch predictions
- Use parallel processing
- Cache where applicable
Benchmark
- Measure before/after speed
- Measure memory usage
- Create comparison table
- Verify results match

Success Criteria

✅ Minimum 10x speedup
✅ Memory usage reduced
✅ Results identical to original
✅ Code is readable and documented

Optional Stretch (⭐)

Achieve 50x+ speedup
Use line_profiler
Create performance visualization
Optimize memory further

Challenge 4: Convergence Crisis ⭐⭐⭐⭐

Difficulty: Advanced
Time: 2-3 hours
Topic: Model debugging

Scenario

Your neural network won’t converge. Debug and fix it!

Code


import numpy as np
from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier
 
X, y = make_classification(
    n_samples=10000, n_features=100, 
    n_informative=50, random_state=42
)
 
# Don't scale features
model = MLPClassifier(
    hidden_layer_sizes=(100, 50),
    learning_rate_init=1.0,  # High learning rate
    max_iter=10,  # Too few iterations
    random_state=42,
    verbose=True
)
 
model.fit(X, y)
print(f"Score: {model.score(X, y):.3f}")

Requirements

Diagnose Issues
- Identify all problems
- Explain impact of each
- Prioritize fixes
Fix Systematically
- Scale features
- Tune learning rate
- Adjust iterations
- Monitor convergence
Learning Curves
- Plot training vs validation loss
- Diagnose overfitting/underfitting
- Apply regularization if needed
Hyperparameter Tuning
- Try different architectures
- Test learning rates: [0.001, 0.01, 0.1]
- Create validation curves

Success Criteria

✅ Model converges properly
✅ No convergence warnings
✅ Test accuracy >85%
✅ Learning curves look healthy

Optional Stretch (⭐)

Implement early stopping
Use GridSearchCV
Compare with other models
Analyze gradient flow

Challenge 5: Error Analyzer ⭐⭐⭐⭐

Difficulty: Advanced
Time: 3-4 hours
Topic: Error analysis

Scenario

Your model has 90% accuracy but fails on critical cases. Analyze and improve!

Dataset

Use MNIST digits or similar multi-class dataset

Requirements

Confusion Matrix Analysis
- Generate confusion matrix
- Identify top 5 confused pairs
- Visualize normalized matrix
- Explain patterns
Per-Class Performance
- Calculate precision/recall/F1 per class
- Identify worst 3 classes
- Analyze why they perform poorly
- Propose class-specific fixes
Failure Case Analysis
- Collect 20+ failure examples
- Categorize error types
- Visualize failure cases
- Find common patterns
Confidence Analysis
- Plot confidence distribution
- Separate correct/incorrect
- Find high-confidence errors
- Create calibration curve
Action Plan
- Prioritize improvements
- Estimate impact
- Propose data collection strategy
- Suggest model changes

Deliverables

Complete error analysis report
Visualizations for all analyses
Categorized failure cases
Detailed improvement roadmap

Success Criteria

✅ Comprehensive confusion matrix analysis
✅ All classes analyzed
✅ Failure patterns identified
✅ Actionable recommendations

Optional Stretch (⭐)

Implement one improvement
Show before/after comparison
Create error monitoring dashboard

Challenge 6: The Production Mystery ⭐⭐⭐⭐⭐

Difficulty: Expert
Time: 4-6 hours
Topic: Real-world debugging

Scenario

Your model works perfectly in development but fails in production. Why?

Given Information

Training accuracy: 95%
Test accuracy: 94%
Production accuracy: 65% (after 1 month)
No code changes were made
Different data source in production

Requirements

Hypothesis Generation
- List 5+ possible causes
- Rank by likelihood
- Plan investigation steps
Distribution Shift Analysis
- Compare train vs production distributions
- Statistical tests (K-S, chi-square)
- Visualize differences
- Quantify shift magnitude
Feature Drift Detection
- Monitor feature statistics
- Detect out-of-range values
- Identify concept drift
- Track label distribution
Root Cause Analysis
- Investigate data pipeline
- Check preprocessing consistency
- Validate assumptions
- Document findings
Solutions
- Propose fixes
- Implement monitoring
- Create retraining strategy
- Build alerting system

Deliverables

Investigation report
Distribution analysis
Monitoring dashboard design
Retraining pipeline proposal
Documentation for ops team

Success Criteria

✅ Root cause identified
✅ Distribution shift quantified
✅ Solution implemented
✅ Monitoring in place
✅ Future prevention strategy

Optional Stretch (⭐⭐)

Implement automated retraining
Create A/B testing framework
Build model versioning system
Deploy monitoring dashboard

Challenge 7: Debug the Debugger ⭐⭐⭐⭐⭐

Difficulty: Expert
Time: 5-8 hours
Topic: Comprehensive debugging

Scenario

Build a comprehensive debugging toolkit for ML pipelines!

Requirements

Automated Bug Detection


class MLPipelineDebugger:
    def check_data_leakage(self, pipeline):
        # Detect common leakage patterns
        pass
    
    def check_feature_target_alignment(self, X, y):
        # Verify alignment
        pass
    
    def check_scaling_issues(self, scaler, X_train, X_test):
        # Detect scaling problems
        pass
    
    def check_class_imbalance(self, y):
        # Identify severe imbalance
        pass
    
    def check_convergence(self, model):
        # Verify model converged
        pass

Performance Profiler
- CPU time tracking
- Memory usage monitoring
- Bottleneck identification
- Optimization suggestions
Model Health Checker
- Overfitting detection
- Underfitting detection
- Learning curve analysis
- Validation curve generation
Error Analyzer
- Automatic confusion matrix
- Per-class metrics
- Failure case collection
- Confidence analysis
Report Generator
- Create HTML report
- Include all analyses
- Actionable recommendations
- Export to PDF

Deliverables

ml_debugger.py - Complete toolkit
Test suite with 10+ test cases
Documentation with examples
Sample reports (HTML/PDF)
Tutorial notebook

Success Criteria

✅ All checkers implemented
✅ Catches common bugs
✅ Works with sklearn models
✅ Generates useful reports
✅ Well documented

Optional Stretch (⭐⭐⭐)

Support PyTorch/TensorFlow
Add interactive visualizations
Create CLI tool
Publish as package
Add CI/CD integration

🏆 Completion Tracker

Track your progress:

Challenge 1: The Mystery Bug ⭐⭐
Challenge 2: Data Detective ⭐⭐⭐
Challenge 3: Speed Demon ⭐⭐⭐
Challenge 4: Convergence Crisis ⭐⭐⭐⭐
Challenge 5: Error Analyzer ⭐⭐⭐⭐
Challenge 6: The Production Mystery ⭐⭐⭐⭐⭐
Challenge 7: Debug the Debugger ⭐⭐⭐⭐⭐

💡 Tips

Start Simple: Begin with Challenge 1 and progress
Document Everything: Keep notes on what you tried
Measure First: Always establish baseline before optimizing
Test Thoroughly: Verify fixes work with different data
Learn from Failures: Every bug teaches something valuable

📚 Resources

Happy Debugging! 🐛→🦋