Pre-Quiz: Model Evaluation & Metrics

Test your baseline knowledge before starting Phase 16!

Time: 10 minutes
Questions: 10
Passing Score: 60%

Questions

1. What is the main limitation of using accuracy for imbalanced datasets?

A) Accuracy is too difficult to calculate
B) Accuracy can be misleadingly high even if the model doesn’t work well
C) Accuracy only works for regression problems
D) Accuracy requires too much computational power

Show Answer

Correct Answer: B

Explanation: With imbalanced data (e.g., 99% negative, 1% positive), a model that always predicts “negative” achieves 99% accuracy but is completely useless for detecting the positive class.

2. Which metric answers: “Of all positive predictions, how many were correct?”

A) Recall
B) Precision
C) F1-Score
D) Accuracy

Show Answer

Correct Answer: B

Explanation: Precision = TP / (TP + FP), which measures the proportion of positive predictions that are actually correct.

3. What does RMSE stand for?

A) Root Mean Square Estimation
B) Regression Mean Squared Error
C) Root Mean Squared Error
D) Random Mean Square Evaluation

Show Answer

Correct Answer: C

Explanation: RMSE = Root Mean Squared Error, calculated as sqrt(MSE).

4. Why is BLEU score primarily used for machine translation?

A) It only works with translated text
B) It measures word-level overlap between generated and reference text
C) It can only evaluate English text
D) It measures grammatical correctness

Show Answer

Correct Answer: B

Explanation: BLEU compares n-gram overlap between generated and reference text, making it suitable for translation where lexical similarity matters.

5. What is demographic parity in fairness metrics?

A) All groups must have equal prediction accuracy
B) All groups must have equal positive outcome rates
C) All groups must have the same features
D) All groups must be the same size

Show Answer

Correct Answer: B

Explanation: Demographic parity requires that the positive outcome rate (e.g., approval rate) is equal across different demographic groups.

6. What does an R² score of 0 mean?

A) Perfect predictions
B) Model is as good as predicting the mean
C) Model is completely wrong
D) Cannot calculate R²

Show Answer

Correct Answer: B

Explanation: R² = 0 means the model performs no better than simply predicting the average value for all samples.

7. Why use cross-validation instead of a single train/test split?

A) It’s faster to compute
B) It uses less data
C) It provides a more reliable performance estimate
D) It always gives better accuracy

Show Answer

Correct Answer: C

Explanation: Cross-validation reduces variance in performance estimates by testing on multiple splits of the data.

8. What is the primary difference between BLEU and ROUGE?

A) BLEU is for classification, ROUGE is for regression
B) BLEU is precision-focused, ROUGE is recall-focused
C) BLEU only works in English
D) ROUGE is more accurate

Show Answer

Correct Answer: B

Explanation: BLEU emphasizes precision (generated text vs reference), while ROUGE emphasizes recall (reference vs generated text).

9. What does a low perplexity score indicate?

A) Model is confused
B) Model is confident/text is familiar
C) Model needs more training
D) Text is too complex

Show Answer

Correct Answer: B

Explanation: Low perplexity means the model assigns high probability to the actual tokens, indicating confidence and familiarity with the text.

10. What is the “80% rule” in fairness testing?

A) Model must be 80% accurate
B) The ratio of positive outcomes between groups should be ≥ 0.8
C) 80% of features must be fair
D) Training data must be 80% balanced

Show Answer

Correct Answer: B

Explanation: The 80% rule (or 4/5ths rule) states that the selection rate for any protected group should be at least 80% of the rate for the highest group.

Self-Check Guide

9-10 correct: Strong foundation for this phase.
7-8 correct: Good baseline; review a few concepts before starting.
5-6 correct: Moderate baseline; pay extra attention during Phase 16.
0-4 correct: Review prerequisite materials before Phase 16.

Key Topics to Review

If you scored low, review these topics:

Classification metrics (precision, recall, F1, accuracy)
Regression metrics (MAE, RMSE, R²)
Fairness concepts (demographic parity, equalized odds)
LLM evaluation (BLEU, ROUGE, perplexity)
Cross-validation basics

Ready to start Phase 16? Let’s go!