1Cademy - Designing an Image Captioning System

Learn Before

End-to-End Image Captioning

Case Study

Designing an Image Captioning System

Case context: You are tasked with designing a machine learning system that can automatically generate descriptive text for photos uploaded by users on a social media platform.

Question: Based on the concept of end-to-end image captioning from Machine Learning Yearning, what should be the exact input and direct output of your neural network to achieve this goal?

Sample answer: To implement end-to-end image captioning, the neural network should take the uploaded user photo (the image, x) as its direct input and produce the descriptive text (the caption, y) as its direct output.

Key points:

Identify the user photo as the system's input (x).
Identify the descriptive text as the system's direct output (y).
Apply the end-to-end framework by connecting the input image directly to the output text.

Rubric: The learner must diagnose the problem as an image captioning task and decide that the uploaded photo is the direct input (x) and the descriptive text is the direct output (y).

0

1

Updated 2026-05-27

Contributors are:

Who are from:

References

Machine Learning Yearning (Deeplearning.ai)

Learn Before

Related