Essay

How does acquiring targeted training data address a data mismatch problem?

Question: In the context of machine learning, explain how acquiring targeted training data can help address a data mismatch problem. Use the speech recognition example to illustrate your explanation.

Sample answer: When a data mismatch occurs, an algorithm performs poorly on dev set examples because they differ significantly from the training distribution. Acquiring targeted training data that matches the difficult dev examples can help bridge this gap. For instance, in a speech recognition system where training data consists of quiet background audio but the dev set features in-car audio, the algorithm will likely struggle with the dev set. By deliberately collecting and adding more training data recorded inside a car, the model can learn to handle this specific noisy environment, thereby reducing the data mismatch and improving performance.

Key points:

  • Define data mismatch as a discrepancy between training and dev distributions.
  • Explain the strategy of collecting new training data that resembles difficult dev set examples.
  • Use the speech recognition example contrasting quiet background training data with in-car dev data.
  • Conclude that matching the training data to the dev data improves performance on those specific difficult cases.

Rubric: The answer should define data mismatch, state the strategy of acquiring matching training data, and correctly apply the speech recognition example.

0

1

Updated 2026-06-07

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI

Related