Learn Before
Concept

GPT (Generative Pre-Training)

The GPT (Generative Pre-Training) model represents an effort to design a general, task-agnostic architecture for context-sensitive representations. Built on a Transformer decoder, it pretrains a language model to represent text sequences. When adapted for downstream applications, the model's output feeds directly into an added linear layer to predict task labels. In sharp contrast to earlier models that freeze their pretrained weights, GPT fine-tunes all parameters in the pretrained Transformer decoder during supervised learning. Evaluated on twelve tasks encompassing natural language inference, question answering, sentence similarity, and classification, GPT improved the state of the art in nine of them with minimal changes to its core architecture.

0

1

Updated 2026-05-29

Tags

Data Science

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

D2L

Dive into Deep Learning @ D2L