About 1,050,000 results
Open links in new tab
  1. ViTAE-SL: A vision transformer-based autoencoder and spatial ...

    Mar 1, 2025 · In this paper, we present a new deep learning model called Vision Transformer-based Autoencoder (ViTAE) for reconstructing large-scale and complex fields. The proposed …

  2. [2301.07382] ViT-AE++: Improving Vision Transformer Autoencoder

    Jan 18, 2023 · Vision transformer-based autoencoder (ViT-AE) by He et al. (2021) is a recent self-supervised learning technique that employs a patch-masking strategy to learn a meaningful …

  3. Leveraging two-dimensional pre-trained vision transformers for …

    Jan 25, 2025 · We employ the adept 2D information to direct a 3D masking-based autoencoder, which uses an encoder-decoder architecture to rebuild the masked point tokens through self …

  4. Rethinking Vision Transformer and Masked Autoencoder in

    Jun 5, 2024 · In this paper, we investigate three key factors (i.e., inputs, pre-training, and finetuning) in ViT for multimodal FAS with RGB, Infrared (IR), and Depth. First, in terms of the …

  5. SMAE-Fusion: Integrating saliency-aware masked autoencoder

    May 1, 2025 · Emerging as a powerful self-supervised training paradigm, masked image modeling enables the learning of robust feature representations applicable to various downstream tasks. …

  6. Adaptive Masked Autoencoder Transformer for image …

    Oct 1, 2024 · In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT …

  7. In this paper, we present a new deep learning model called Vision Transformer-based Autoencoder (ViTAE) for re-constructing large-scale and complex fields. The proposed …

  8. Medical Image Synthesis Using Autoencoder with Vision Transformer

    This paper proposes a novel architecture for synthesizing CMR images from TTE inputs using an integrated autoencoder and vision transformer. The autoencoder captures TTE patterns and …

  9. Spatial–Temporal Heatmap Masked Autoencoder for Skeleton …

    2 days ago · During pre-training, a Vision Transformer-based autoencoder equipped with a lightweight prediction head reconstructs the masked regions, fostering the extraction of robust …

  10. Generalized Concordant Vision Transformer with Masked Image …

    3 days ago · Abstract: The vision transformer (ViT) architecture offers significant advantages in object detection tasks. However, some limitations affect improving task performance. Firstly, …