Classifying a Voice Assistant's Output and Determining End-to-End Data Requirements
Case context: A machine learning team is building a new voice assistant. They intend to use an end-to-end deep learning architecture that directly processes spoken commands and generates complete text sentences for downstream processing.
Question: Based on Machine Learning Yearning, how would you classify the output of this speech recognition system, and what specific data must the team collect to successfully train this end-to-end model?
Sample answer: The output of this system is classified as a "rich output" because a sentence transcription is more complex than a single number. To successfully train this end-to-end model, the team must collect a dataset consisting of the right labeled (input, output) pairs, specifically pairs of audio recordings and their corresponding text transcriptions.
Key points:
- Classifies the transcription output as a rich output.
- States the need for labeled (input, output) pairs.
- Specifies audio and transcription pairs for training.
Rubric: The response must classify the transcription as a rich output (or richer than a single number) and state that training requires the right labeled (input, output) pairs (audio and transcriptions).
0
1
Tags
Machine Learning
Deep Learning
Supervised Learning
Dive into Deep Learning @ D2L
Data Science
Machine Learning Strategy
Machine Learning Yearning @ DeepLearning.AI
Related
In end-to-end speech recognition, what are the input and output respectively?
A transcription output in speech recognition is richer than a single number, qualifying it as a rich output.
In speech recognition, _____ is the input and a transcription is the output.
Match each speech recognition component to its role in end-to-end rich-output learning.
Order the reasoning steps for determining whether a task qualifies as end-to-end rich-output learning.
How does MLY characterize the trend of end-to-end learning with rich outputs in modern deep learning?
MLY states that end-to-end learning with rich outputs is possible when you have the right labeled (input, output) pairs.
MLY describes end-to-end learning with rich outputs as an _____ trend in deep learning.
Match each output example to its correct classification as a rich output or a single-number output.
Order the steps that describe how end-to-end speech recognition operates as a rich-output learning problem.
Analyze the role of speech recognition as an example of an accelerating deep learning trend.
Classifying a Voice Assistant's Output and Determining End-to-End Data Requirements
Defining the Components and Nature of Rich-Output Speech Recognition