News

FramePack can be trained with a much larger batch size, similar to the batch size for image diffusion training ... at 30fps (1800 frames) using 13B model, the minimal required GPU memory is 6GB. (Yes ...
We introduce a diffusion-based singing voice synthesis model that operates without F0. MIDI-Voice 2 consists of two diffusion models: a prior generator and a singing voice generator. The prior ...
cultivation in the Tashkent region of Uzbekistan using the HYDRUS-1D simulation model. Field experiments were conducted over two growing seasons, accompanied by soil moisture monitoring, root zone ...
To address this pressing issue, we propose SocioDiff, a novel diffusion-based model for generating synthetic residential electricity consumption data that incorporates socio-demographic information.
(1) We released the 50 diffusion steps model (instead of 1000 steps) which runs 20X faster with comparable results. (2) Calling CLIP just once and caching the result runs 2X faster for all models.