Learn Before
Comparison

Comparison of ELMo and GPT on Downstream Adaptation

ELMo and GPT represent fundamentally different paradigms for adapting pretrained models to downstream natural language processing tasks. On one hand, ELMo provides context-sensitive representations but requires crafting a customized, task-specific architecture for each target task, and its pretrained parameters remain frozen during supervised learning. On the other hand, GPT offers a task-agnostic approach by using a unified Transformer decoder architecture where downstream tasks are accommodated by simply adding a linear output layer; additionally, GPT fine-tunes all of its pretrained parameters during downstream training rather than freezing them.

0

1

Updated 2026-05-29

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L