Iniysa on X reports that Apple's M4 Max accomplished an audio transcode with Whisper V3 Turbo in half the time of Nvidia's Ampere-based RTX A5000 GPU while using nearly eight times less power.
We train our model with two NVIDIA A5000 (24GB) GPUs for about two days. However, the model should perform reasonably well after 12 hours of training. It is also possible to train with a single GPU.
GPU allocation: Our work is experimented on NVIDIA A5000 GPUs. For finetuning, we use one GPU for CIFAR-10, and three GPUs for 256x256 resolution datasets. Check out the clean-fid paper On Aliased ...