Learn Before
Transformer
A Survey of Transformers (Lin et. al, 2021)
Lin, Tianyang & Wang, Yuxin & Liu, Xiangyang & Qiu, Xipeng. (2021). A Survey of Transformers.
0
1
Tags
Data Science
Related
Self-attention layers' first approach
Transformers in contextual generation and summarization
Huggingface Model Summary
A Survey of Transformers (Lin et. al, 2021)
Overview of a Transformer
Model Usage of Transformers
Attention in vanilla Transformers
Transformer Variants (X-formers)
The Pre-training and Fine-tuning Paradigm
Architectural Categories of Pre-trained Transformers
Transformer Blocks and Post-Norm Architecture
Model Depth (L) in Transformers
Computational Cost of Self-Attention in Transformers
Quadratic Complexity's Impact on Transformer Inference Speed
Pre-Norm Architecture in Transformers
Training Transformers as Language Models via Standard Optimization
Critique of the Transformer Architecture's Core Limitation
A research team is building a model to summarize extremely long scientific papers. They are comparing two distinct architectural approaches:
- Approach 1: Processes the input text sequentially, token by token, updating an internal state that is passed from one step to the next.
- Approach 2: Processes all input tokens simultaneously, using a mechanism that directly relates every token to every other token in the input to determine context.
Which of the following statements best analyzes the primary trade-off between these two approaches for this specific task?
Architectural Design Choice for Machine Translation
Learn After
Generating Long Sequences with Sparse Transformers (Child et. al, 2019)