1Cademy - Adding More Training Data Does Not Always Help

Learn Before

Bias and Variance as Two Major Sources of Error
Choosing Dev and Test Sets to Reflect Future Data

Concept

Adding More Training Data Does Not Always Help

Data with no benefit should be left out for computational reasons. In the cat-detector example, scanned historical documents that contain nothing resembling a cat and look completely unlike the dev/test distribution have negligible benefit, and including them would waste computation resources and neural-network representation capacity.

Updated 2026-05-26

Contributors are: