Pre-Quiz: Debugging & Troubleshooting
Test your baseline knowledge before starting Phase 17.
Time: 10 minutes
Questions: 10
Passing Score: 60%
Instructions
Answer each question and check your responses. This helps identify areas to focus on.
Questions
1. What is the first step in debugging an ML model that’s not learning?
A) Increase model complexity
B) Reproduce the bug consistently
C) Add more data
D) Try a different algorithm
Show Answer
Correct Answer: B
Explanation: The first step in any debugging workflow is to reproduce the bug consistently. You can’t fix what you can’t reliably observe. Once you can reproduce the issue, you can then gather data, hypothesize causes, and test fixes.
2. Which of the following is a sign of data leakage?
A) Test accuracy much lower than training accuracy
B) Test accuracy suspiciously close to or higher than training accuracy
C) Model takes long to train
D) Missing values in the dataset
Show Answer
Correct Answer: B
Explanation: Data leakage occurs when information from the test set influences training. This typically results in unrealistically high test accuracy that won’t generalize. If test accuracy is very close to or exceeds training accuracy, suspect leakage.
3. What does a large gap between training and validation accuracy indicate?
A) Underfitting
B) Overfitting
C) Good generalization
D) Data leakage
Show Answer
Correct Answer: B
Explanation: A large gap (training accuracy much higher than validation) indicates overfitting - the model has memorized the training data but doesn’t generalize well. Solutions include regularization, more data, or reducing model complexity.
4. Which tool would you use for CPU profiling in Python?
A) memory_profiler
B) cProfile
C) pdb
D) pytest
Show Answer
Correct Answer: B
Explanation: cProfile is Python’s built-in CPU profiler that shows where your code spends time. memory_profiler is for memory, pdb is for interactive debugging, and pytest is for testing.
5. What’s the primary benefit of vectorization in ML code?
A) Easier to read
B) Uses less memory
C) Much faster execution
D) Better accuracy
Show Answer
Correct Answer: C
Explanation: Vectorization replaces Python loops with optimized NumPy operations, resulting in significantly faster execution (often 10-100x speedup). It leverages low-level optimizations and can use SIMD instructions.
6. Why is it important to scale features before training?
A) To make the model train faster
B) To ensure features are on similar scales for convergence
C) To reduce overfitting
D) To handle missing values
Show Answer
Correct Answer: B
Explanation: Unscaled features with different ranges can cause convergence issues, especially for gradient-based algorithms. Features on different scales can make some weights update much faster than others, preventing proper convergence.
7. What does the diagonal of a confusion matrix represent?
A) False positives
B) False negatives
C) Correct predictions
D) Total predictions
Show Answer
Correct Answer: C
Explanation: The diagonal elements of a confusion matrix represent correct predictions (true positives and true negatives). Off-diagonal elements represent errors (false positives and false negatives).
8. If your model has precision=0.95 and recall=0.40, what should you do?
A) Model is perfect, do nothing
B) Focus on reducing false negatives
C) Focus on reducing false positives
D) Collect more data
Show Answer
Correct Answer: B
Explanation: Low recall (0.40) means the model misses many positive cases (high false negatives). High precision (0.95) means when it predicts positive, it’s usually right. To improve recall, you might lower the decision threshold or address class imbalance.
9. What’s the best way to handle missing values that are >50% of a column?
A) Fill with mean
B) Fill with median
C) Consider dropping the column
D) Fill with mode
Show Answer
Correct Answer: C
Explanation: When more than 50% of values are missing, the column provides little information and imputation would be mostly guessing. It’s often better to drop such columns unless the missingness itself is meaningful.
10. What does a learning curve that shows both training and validation scores are low indicate?
A) Overfitting
B) Underfitting
C) Perfect fit
D) Data leakage
Show Answer
Correct Answer: B
Explanation: When both training and validation scores are low and not improving with more data, it indicates underfitting (high bias). The model is too simple to capture the patterns. Solutions: increase model complexity or add more features.
Self-Check Guide
Count your correct answers:
- 9-10 correct: Strong foundation for this phase.
- 7-8 correct: Good starting point; review a few concepts before diving in.
- 5-6 correct: Moderate baseline; pay extra attention to the weaker topics.
- 0-4 correct: Review prerequisite material before Phase 17.
Key Topics to Review
Based on your score, focus on:
If you missed 1-3:
- General debugging workflow
- Common ML pitfalls
If you missed 4-5:
- Data quality issues
- Model evaluation metrics
- Performance optimization basics
If you missed 6-8:
- Review Phase 16 (Model Evaluation)
- Study debugging fundamentals
- Practice with simple examples
If you missed 9-10:
- Complete prerequisite phases first
- Review Python and ML basics
- Start with simpler debugging exercises
Next Steps
- Review any questions you got wrong
- Check the explanations carefully
- Read relevant notebook sections
- Begin Phase 17 when ready!
Good luck with Phase 17!