Learn Before
  • Transformer

A Survey of Transformers (Lin et. al, 2021)

Lin, Tianyang & Wang, Yuxin & Liu, Xiangyang & Qiu, Xipeng. (2021). A Survey of Transformers.

0

1

3 years ago

Contributors are:

Who are from:

Tags

Data Science

Related
  • Self-attention layers' first approach

  • Transformers in contextual generation and summarization

  • Huggingface Model Summary

  • A Survey of Transformers (Lin et. al, 2021)

  • Overview of a Transformer

  • Model Usage of Transformers

  • Attention in vanilla Transformers

  • Transformer Variants (X-formers)

  • The Pre-training and Fine-tuning Paradigm

  • Architectural Categories of Pre-trained Transformers

  • Transformer Blocks and Post-Norm Architecture

  • Model Depth (L) in Transformers

  • Computational Cost of Self-Attention in Transformers

  • Quadratic Complexity's Impact on Transformer Inference Speed

  • Pre-Norm Architecture in Transformers

  • Training Transformers as Language Models via Standard Optimization

  • Critique of the Transformer Architecture's Core Limitation

  • A research team is building a model to summarize extremely long scientific papers. They are comparing two distinct architectural approaches:

    • Approach 1: Processes the input text sequentially, token by token, updating an internal state that is passed from one step to the next.
    • Approach 2: Processes all input tokens simultaneously, using a mechanism that directly relates every token to every other token in the input to determine context.

    Which of the following statements best analyzes the primary trade-off between these two approaches for this specific task?

  • Architectural Design Choice for Machine Translation

Learn After
  • Generating Long Sequences with Sparse Transformers (Child et. al, 2019)