Concept

Key-Value (KV) Cache in Transformer Inference

The Key-Value (KV) cache is a crucial component for efficient autoregressive inference in Transformer models. It functions as a memory store for the key and value vectors representing all previously processed tokens. At each generation step, instead of recomputing these vectors for the entire preceding sequence, the model generates a new representation for the current token and has it attend to the historical representations stored in the cache. This mechanism of storing and reusing past context significantly improves inference speed and is fundamental to the model's operation.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related
Learn After