Learn Before
Classification of Sparse Attention Models by Definition of G
Positional-based Sparse Attention
In positional-based sparse attention, the index set is defined using pre-determined, heuristically designed patterns based on the relative positions of tokens, rather than their content. This means the sparsity pattern is fixed and does not depend on the input values. A common and widely-used example of such a heuristic pattern is the sliding window, where the set for a token at position covers a fixed-size window of nearby tokens.
0
1
Tags
Data Science
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Content-based Sparse Attention
Positional-based Sparse Attention
Classifying a Novel Sparse Attention Mechanism
An engineer develops a sparse attention mechanism where, for any given token, the set of other tokens it can attend to is defined by a pre-determined, structured pattern based on their relative distance in the sequence. For example, a token might only attend to the 8 tokens immediately preceding it. This attention pattern does not change, regardless of the specific words or meaning of the input text. Based on how the set of attended-to indices is defined, how should this mechanism be classified?
A key characteristic of all sparse attention models is that the set of attended-to indices for a given token is dynamically determined by finding other tokens with the most similar content.
Learn After
Atomic Sparse Attention Example Diagram
Compound Sparse Attention
Extended Sparse Attention
An engineer designs a sparse attention mechanism where, for any given token at position
i
, the model is only allowed to attend to the tokens within a fixed-size window around it (e.g., from positioni-k
toi+k
). This rule is applied uniformly across the entire sequence, irrespective of the specific words involved. Which statement best analyzes the core principle of this design?Analysis of a Sparse Attention Strategy
In a positional-based sparse attention mechanism, the set of tokens that a given token attends to is dynamically adjusted during processing based on the semantic similarity of the surrounding tokens.