Example

Iterative Application of Argmax for Next Token Prediction

The argmax function is applied iteratively to select the most probable next token at each step of sequence generation. For a sequence beginning with the prefix sa\langle s \rangle a, the model first predicts the token x2x_2 by maximizing the conditional probability given sa\langle s \rangle a. It then uses this new context to predict x3x_3, and so on. This step-by-step process is illustrated by the following sequence of operations:

  1. Predict the second token: arg maxx2VPr(x2sa)\argmax_{x_2 \in V} \Pr(x_{2}|\langle s \rangle a)
  2. Predict the third token: arg maxx3VPr(x3sab)\argmax_{x_3 \in V} \Pr(x_{3}|\langle s \rangle a b)
  3. Predict the fourth token: arg maxx4VPr(x4sabc)\argmax_{x_4 \in V} \Pr(x_{4}|\langle s \rangle a b c)

This iterative selection, where each new token is chosen by maximizing its conditional probability based on the preceding context, is a core mechanism of greedy decoding in autoregressive models.

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences