About 199,000 results
Open links in new tab
  1. Decoder-Only Transformers: The Workhorse of Generative LLMs

    The decoder-only transformer architecture is comprised of several “blocks” with identical structure that are stacked in sequence. Within each of these blocks, there are two primary components: …

  2. StableMask: Refining Causal Masking in Decoder-only Transformer

    Feb 7, 2024 · In this work, we propose StableMask: a parameter-free method to address both limitations by refining the causal mask. It introduces pseudo-attention values to balance …

  3. How does the (decoder-only) transformer architecture work?

    May 30, 2023 · LLMs/GPT models use a variant of this architecture called de' decoder-only transformer'. The most popular variety of transformers are currently these GPT models. The …

  4. Transformer Architecture: The Positional Encoding

    Sep 20, 2019 · Transformer architecture was introduced as a novel pure attention-only sequence-to-sequence architecture by Vaswani et al. Its ability for parallelizable training and its general …

  5. Transformer Architecture | LLM: From Zero to Hero

    Feb 22, 2024 · In general, encoder-only architectures are proficient in extracting information from text for tasks like classification and regression, whereas decoder-only models specialize in …

  6. Decoder-Only Transformers Explained: The Engine Behind LLMs

    Aug 31, 2024 · Large language models (LLMs) like GPT-3, LLaMA, and Gemini are revolutionizing how we interact with and generate text. At the heart of these powerful models …

  7. In this work, we propose StableMask: a parameter-free method to address both limitations by refining the causal mask. It introduces pseudo-attention values to balance attention …

  8. Decoder-Only Transformers: The Brains Behind Generative AI, …

    Dec 20, 2024 · Most popular examples of AI models based on the decoder-only transformer are: GPT, PaLM, LaMDa, Llama, and Falcon [23, 24]. This paper will explain about the transformer …

  9. Large Language Model (LLM) - Part 2/2: Transformer Architecture

    Feb 16, 2025 · Rather than feeding the embedded word vector (i.e., token embedding plus positional encoding) directly into the decoder layers, the Transformer first computes the Query …

  10. How to Implement a Decoder-Only Transformer in TensorFlow

    In order to build a decoder-only transformer in TensorFlow, we need to implement the components of the decoder, such as self-attention layers, feed-forward networks, and …

Refresh