Case Study

Classifying a Voice Assistant's Output and Determining End-to-End Data Requirements

Case context: A machine learning team is building a new voice assistant. They intend to use an end-to-end deep learning architecture that directly processes spoken commands and generates complete text sentences for downstream processing.

Question: Based on Machine Learning Yearning, how would you classify the output of this speech recognition system, and what specific data must the team collect to successfully train this end-to-end model?

Sample answer: The output of this system is classified as a "rich output" because a sentence transcription is more complex than a single number. To successfully train this end-to-end model, the team must collect a dataset consisting of the right labeled (input, output) pairs, specifically pairs of audio recordings and their corresponding text transcriptions.

Key points:

  • Classifies the transcription output as a rich output.
  • States the need for labeled (input, output) pairs.
  • Specifies audio and transcription pairs for training.

Rubric: The response must classify the transcription as a rich output (or richer than a single number) and state that training requires the right labeled (input, output) pairs (audio and transcriptions).

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Data Science

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI