1Cademy - Positional-based Sparse Attention

Learn Before

Classification of Sparse Attention Models by Definition of G

Positional-based Sparse Attention

In positional-based sparse attention, the index set $G$ is defined using pre-determined, heuristically designed patterns based on the relative positions of tokens, rather than their content. This means the sparsity pattern is fixed and does not depend on the input values. A common and widely-used example of such a heuristic pattern is the sliding window, where the set $G$ for a token at position $i$ covers a fixed-size window of nearby tokens.

7 days ago

Contributors are:

Who are from:

References

Learn After

Atomic Sparse Attention Example Diagram
Compound Sparse Attention
Extended Sparse Attention
An engineer designs a sparse attention mechanism where, for any given token at position i, the model is only allowed to attend to the tokens within a fixed-size window around it (e.g., from position i-k to i+k). This rule is applied uniformly across the entire sequence, irrespective of the specific words involved. Which statement best analyzes the core principle of this design?
Analysis of a Sparse Attention Strategy
In a positional-based sparse attention mechanism, the set of tokens that a given token attends to is dynamically adjusted during processing based on the semantic similarity of the surrounding tokens.

Learn Before

Related

Learn After