Learn Before
Essay

Explain how specific error comparisons point to data mismatch.

Question: Explain how comparing the 1.5% error on unseen training-distribution data to both the 1% training error and the 10% dev-set error isolates data mismatch as the primary problem.

Sample answer: The small gap between the 1% training error and 1.5% unseen training-distribution error indicates that the model generalizes well to the training distribution and does not suffer from high variance (overfitting). However, the large gap between the 1.5% unseen training-distribution error and the 10% dev-set error reveals that the model struggles to generalize to the dev-set distribution. Since variance is low, this massive gap is clearly caused by a mismatch between the training and dev data distributions.

Key points:

  • Comparing 1% training error to 1.5% unseen same-distribution error shows low variance.
  • Comparing 1.5% unseen same-distribution error to 10% dev error isolates the distribution shift.
  • The conclusion is that the algorithm fails on the dev set specifically because its data distribution differs from the training data.

Rubric: The response should explicitly define the purpose of both comparisons (train vs. unseen same-distribution, unseen same-distribution vs. dev) and clearly conclude that the model generalizes to the training distribution but fails on the dev distribution.

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI