Learn Before
Case Study

Designing an Image Captioning System

Case context: You are tasked with designing a machine learning system that can automatically generate descriptive text for photos uploaded by users on a social media platform.

Question: Based on the concept of end-to-end image captioning from Machine Learning Yearning, what should be the exact input and direct output of your neural network to achieve this goal?

Sample answer: To implement end-to-end image captioning, the neural network should take the uploaded user photo (the image, x) as its direct input and produce the descriptive text (the caption, y) as its direct output.

Key points:

  • Identify the user photo as the system's input (x).
  • Identify the descriptive text as the system's direct output (y).
  • Apply the end-to-end framework by connecting the input image directly to the output text.

Rubric: The learner must diagnose the problem as an image captioning task and decide that the uploaded photo is the direct input (x) and the descriptive text is the direct output (y).

0

1

Updated 2026-05-27

Contributors are:

Who are from:

Tags

Python Programming Language

Data Science

Machine Learning

Deep Learning

Supervised Learning

Dive into Deep Learning @ D2L

Machine Learning Strategy

Machine Learning Yearning @ DeepLearning.AI