Learn Before
  • Classification of Sparse Attention Models by Definition of G

Positional-based Sparse Attention

In positional-based sparse attention, the index set GG is defined using pre-determined, heuristically designed patterns based on the relative positions of tokens, rather than their content. This means the sparsity pattern is fixed and does not depend on the input values. A common and widely-used example of such a heuristic pattern is the sliding window, where the set GG for a token at position ii covers a fixed-size window of nearby tokens.

0

1

7 days ago

Tags

Data Science

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Content-based Sparse Attention

  • Positional-based Sparse Attention

  • Classifying a Novel Sparse Attention Mechanism

  • An engineer develops a sparse attention mechanism where, for any given token, the set of other tokens it can attend to is defined by a pre-determined, structured pattern based on their relative distance in the sequence. For example, a token might only attend to the 8 tokens immediately preceding it. This attention pattern does not change, regardless of the specific words or meaning of the input text. Based on how the set of attended-to indices is defined, how should this mechanism be classified?

  • A key characteristic of all sparse attention models is that the set of attended-to indices for a given token is dynamically determined by finding other tokens with the most similar content.

Learn After
  • Atomic Sparse Attention Example Diagram

  • Compound Sparse Attention

  • Extended Sparse Attention

  • An engineer designs a sparse attention mechanism where, for any given token at position i, the model is only allowed to attend to the tokens within a fixed-size window around it (e.g., from position i-k to i+k). This rule is applied uniformly across the entire sequence, irrespective of the specific words involved. Which statement best analyzes the core principle of this design?

  • Analysis of a Sparse Attention Strategy

  • In a positional-based sparse attention mechanism, the set of tokens that a given token attends to is dynamically adjusted during processing based on the semantic similarity of the surrounding tokens.