News

Shikai Li, Jianglin Fu, Kaiyuan Liu, Wentao Wang, Kwan-Yee Lin, Wayne Wu | [Project Page] | | [Huggingface Gradio] Abstract: We present CosmicMan, a text-to-image foundation model specialized for ...
Meta recently open-sourced Massively Multilingual & Multimodal Machine Translation (SeamlessM4T), a multilingual translation AI that can translate both speech audio and text data across nearly 100 lan ...
We are excited to announce the release of TokenFD, the first token-level visual foundation model specifically tailored for text-image-related tasks, designed to support a variety of traditional ...
Google’s next-generation text-to-image foundation model is coming to the company’s Vertex AI platform. Imagen 3 will be available for select customers in preview, offering developers faster ...
Vertex AI Studio is an online environment for building AI apps, featuring Gemini, Google’s own multimodal generative AI model that can work with text, code, audio, images, and video. In addition ...
I tested Hume's new Octave model and was impressed with the results. Now you can try it, too. This new text-to-speech AI model understands what it's saying - how to try it for free | ZDNET ...
The company took a step in another technological direction by launching its first stand-alone speech-to-text model called Scribe. The startup, valued at $3.3 billion, ...
Abstract: We present CosmicMan, a text-to-image foundation model specialized for generating high-fidelity human images. Unlike current general-purpose foundation models that are stuck in the dilemma ...