
ViTAE-SL: A vision transformer-based autoencoder and spatial ...
Mar 1, 2025 · In this paper, we present a new deep learning model called Vision Transformer-based Autoencoder (ViTAE) for reconstructing large-scale and complex fields. The proposed …
[2301.07382] ViT-AE++: Improving Vision Transformer Autoencoder …
Jan 18, 2023 · Vision transformer-based autoencoder (ViT-AE) by He et al. (2021) is a recent self-supervised learning technique that employs a patch-masking strategy to learn a meaningful …
Leveraging two-dimensional pre-trained vision transformers for …
Jan 25, 2025 · We employ the adept 2D information to direct a 3D masking-based autoencoder, which uses an encoder-decoder architecture to rebuild the masked point tokens through self …
Rethinking Vision Transformer and Masked Autoencoder in
Jun 5, 2024 · In this paper, we investigate three key factors (i.e., inputs, pre-training, and finetuning) in ViT for multimodal FAS with RGB, Infrared (IR), and Depth. First, in terms of the …
SMAE-Fusion: Integrating saliency-aware masked autoencoder …
May 1, 2025 · Emerging as a powerful self-supervised training paradigm, masked image modeling enables the learning of robust feature representations applicable to various downstream tasks. …
Adaptive Masked Autoencoder Transformer for image …
Oct 1, 2024 · In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT …
In this paper, we present a new deep learning model called Vision Transformer-based Autoencoder (ViTAE) for re-constructing large-scale and complex fields. The proposed …
Medical Image Synthesis Using Autoencoder with Vision Transformer
This paper proposes a novel architecture for synthesizing CMR images from TTE inputs using an integrated autoencoder and vision transformer. The autoencoder captures TTE patterns and …
Spatial–Temporal Heatmap Masked Autoencoder for Skeleton …
2 days ago · During pre-training, a Vision Transformer-based autoencoder equipped with a lightweight prediction head reconstructs the masked regions, fostering the extraction of robust …
Generalized Concordant Vision Transformer with Masked Image …
3 days ago · Abstract: The vision transformer (ViT) architecture offers significant advantages in object detection tasks. However, some limitations affect improving task performance. Firstly, …