About 139,000 results
Open links in new tab
  1. Why can decoder-only transformers be so good at machine translation

    Jun 8, 2023 · In my understanding encoder-decoder transformers for translation are trained with sentence or text pairs. How can it be explained in simple (high-level) terms that decoder-only transformers (e.g. G...

  2. Mastering Decoder-Only Transformer: A Comprehensive Guide

    Apr 26, 2024 · Explore the architecture and components of the Decoder-Only Transformer model. Understand the role of attention mechanisms, including Scaled Dot-Product Attention and Masked Self-Attention, in the model. Examine the importance of positional embeddings and normalization techniques in transformer models.

  3. How does the (decoder-only) transformer architecture work?

    May 30, 2023 · However, models such as GPT-3, ChatGPT, GPT-4 & LaMDa use the (decoder-only) transformer architecture. It is key first to understand the input and output of a transformer: The input is a prompt (often referred to as context) fed into the transformer as a whole. There is no recurrence. The output depends on the goal of the model.

  4. Decoder-Only Transformers Explained: The Engine Behind LLMs

    Aug 31, 2024 · Large language models (LLMs) like GPT-3, LLaMA, and Gemini are revolutionizing how we interact with and generate text. At the heart of these powerful models lies a specific type of neural network...

  5. Decoder-Only Transformers: The Workhorse of Generative LLMs

    Decoder-only transformers receive a textual prompt as input. First, we use a tokenizer— based upon an algorithm like Byte-Pair Encoding —to break this text into discrete tokens. Then, we map each of these tokens to a corresponding token vector stored within an embedding layer.

  6. Decoder-only Streaming Transformer for Simultaneous Translation

    Apr 18, 2025 · However, directly applying the Decoder-only architecture to SiMT poses challenges in terms of training and inference. To alleviate the above problems, we propose the first Decoder-only SiMT model, named Decoder-only Streaming Transformer (DST).

  7. Scaling Laws of Decoder-Only Models on the Multilingual …

    Sep 23, 2024 · Recent studies have showcased remarkable capabilities of decoder-only models in many NLP tasks, including translation. Yet, the machine translation field has been largely dominated by encoder-decoder models based on the Transformer architecture.

  8. Exploring Decoder-Only Transformers for NLP and More

    Jan 27, 2023 · A “decoder-only transformer” is a type of neural network architecture that’s commonly used in natural language processing tasks such as machine translation and text summarization. It is a variation of the original transformer architecture, which was introduced in the 2017 paper by Google researchers “Attention is All you Need.”

  9. The Mechanics of Transformer Models: Decoding the Decoder-Only ...

    Encoder-only models, exemplified by BERT (Bidirectional Encoder Representations from Transformers), specialize in understanding or “encoding” language. They excel at tasks that require a deep comprehension of context, such as sentiment analysis, language understanding, and text classification.

  10. Scaling Laws of Decoder-Only Models on the Multilingual …

    We trained a collection of six decoder-only models, ranging from 70M to 7B parameters, on a sentence-level, multilingual (8 languages) and multidomain (9 domains) dataset.

Refresh