Learn Before
Analyzing Computational and Representational Costs of Unhelpful Training Data
Question: Explain how incorporating training data that has no benefit impacts a machine learning system's training process, specifically focusing on computational resources and neural-network representation capacity. Ground your answer in the concept of excluding data that deviates entirely from the dev/test distribution.
Sample answer: According to the source, adding training data that has no benefit should be left out for computational reasons. Including irrelevant data wastes valuable computational resources during training. Furthermore, including such data wastes the neural-network's representation capacity, as the network dedicates its parameters to learning and representing features that are irrelevant to the dev/test distribution.
Key points:
- Data with no benefit should be left out of training for computational reasons.
- Including irrelevant data wastes neural-network representation capacity.
- Irrelevant data forces the network to learn features that do not align with the dev/test distribution.
Rubric: A satisfactory response must mention that unhelpful training data consumes/wastes computational resources during training and also wastes the neural-network's representation capacity by forcing it to represent patterns that do not benefit the dev/test distribution.
0
1
References
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Machine Learning Yearning (Deeplearning.ai)
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Related
In the cat-detector example, why does Machine Learning Yearning recommend excluding scanned historical documents that look nothing like the dev/test distribution?
True or False: Adding more training data always improves model performance when training, dev, and test sets share the same distribution.
According to Machine Learning Yearning, if the dev error curve has _____ (i.e., flattened out), adding more training data will not help you reach your performance goal.
Why does Machine Learning Yearning recommend leaving out training data that has no benefit for your model?
True or False: According to Machine Learning Yearning, adding more training data can actually hurt model performance.
If the dev error curve has _____, you can immediately tell that adding more training data won't help reach your performance goal.
Match each concept to its correct description in Machine Learning Yearning's discussion of when adding training data does not help.
Order the steps for using a learning curve to decide whether to collect more training data, as described in Machine Learning Yearning.
In the cat-detector example, why should a large collection of scanned historical documents be excluded from training?
True or False: According to Machine Learning Yearning, inspecting the learning curve can prevent wasting months collecting data that turns out not to help.
According to Machine Learning Yearning, data that has no _____ should be left out of training for computational reasons.
Match each training data scenario to the recommended action from Machine Learning Yearning.
Order the reasoning steps for evaluating whether a new data source (e.g., scanned documents) should be added to training, per Machine Learning Yearning.
Analyzing Computational and Representational Costs of Unhelpful Training Data
Evaluating the Inclusion of Historical Document Scans in a Casual Cat-Detector Model
Neural Network Capacity and Irrelevant Training Data