Case Study

Evaluate a team's struggles with data synthesis for an image classifier.

Case context: Your team has spent three weeks generating synthetic images for a computer vision project. So far, adding these synthetic images to the training set has not improved the model's accuracy on the dev set. Some team members want to abandon data synthesis, arguing it is a waste of time.

Question: Based on Andrew Ng's advice, what should you diagnose as the likely issue, and what decision should you make regarding the synthetic data effort?

Sample answer: The likely issue is that the details of the synthetic images are not yet close enough to the actual distribution. Andrew Ng notes that this process can take weeks before producing a significant effect. Therefore, the team should not immediately abandon the effort, but instead focus on diagnosing what details are missing and refining the synthetic data generation to better match the real distribution, because getting it right will provide a massive increase in training data.

Key points:

  • The synthetic details likely do not match the actual distribution yet.
  • It is common for this process to take weeks before seeing a significant effect.
  • The team should continue refining the data due to the potential payoff of a much larger training set.

Rubric: The response should identify that the synthetic data details likely do not match the real distribution yet, note that spending weeks without success is a common challenge, and recommend refining the data rather than abandoning the effort.

0

1

Updated 2026-06-12

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI