Learn Before
Developing an Image-to-Text Model
Case context: A machine learning team wants to build a system that takes an image as input and generates a descriptive sentence as output. They are considering an end-to-end deep learning approach.
Question: Based on the concept of directly learning rich outputs, what specific data must the team possess to successfully train this end-to-end system, and how does ML Yearning classify the type of output this system generates?
Sample answer: The team must possess a dataset consisting of the right labeled (input, output) pairs, specifically pairs of images and their corresponding descriptive sentences. ML Yearning classifies the generated sentence as a 'rich output,' meaning it is an output much more complex than a single number.
Key points:
- The team needs the right labeled input-output pairs.
- The output (a sentence) is classified as a rich output.
- Rich outputs are more complex than predicting a number.
Rubric: The learner should state the requirement for labeled input-output pairs (specifically images paired with sentences) and classify the sentence output as a 'rich output' or complex output.
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
End-to-End Image Captioning
End-to-End Text-to-Speech
End-to-End Question Answering
End-to-End Machine Translation as Rich Output Learning
End-to-End Speech Recognition as Rich Output Learning
Which of the following best describes the outputs that end-to-end deep learning can directly learn?
End-to-end deep learning is limited to predicting outputs that are single numbers.
To train an end-to-end system that produces rich outputs, you need the right labeled _____ pairs.
Match each output category to an example of a rich output in end-to-end deep learning.
Order the reasoning steps a practitioner follows when deciding whether end-to-end learning can produce a rich output.
Which condition does ML Yearning identify as the key prerequisite for end-to-end learning to produce rich outputs?
A sentence is an example of a rich output that end-to-end deep learning can learn to produce directly.
ML Yearning describes the ability to learn rich outputs end-to-end as 'an accelerating _____ in deep learning.'
Match each end-to-end deep learning application to the type of rich output it produces.
Order the steps for building an end-to-end deep learning system that produces a rich output such as a translated sentence.
Significance of Learning Rich Outputs
Developing an Image-to-Text Model
End-to-End Deep Learning Output Trend