PyTorch implementation of a generative model for high-fidelity audio generation from text prompts.
Zero-shot voice conversion and singing voice conversion with real-time support and fine-tuning capabilities.
A powerful framework for building realtime voice AI agents.
Instant voice cloning by MIT and MyShell. Audio foundation model.
Fuse ChatTTS with OpenVoice to clone your personalized voice from a 10-second audio clip upload.
Real-time voice interactive digital human supporting customizable appearance and voice with low latency.
Foundational Models for State-of-the-Art Speech and Text Translation.
Orpheus TTS is an open-source system for human-sounding speech synthesis using Llama-3b backbone.
TTSFM is a reverse-engineered API server mirroring OpenAI's TTS service for text-to-speech conversion.
Qwen2.5-Omni is an end-to-end multimodal model by Alibaba Cloud, capable of understanding text, audio, vision, and video.
Demo app for Groq plugins in LiveKit Agents.
A simple voice generation tool that converts text to natural speech using the CosyVoice2 model.