VibeVoice is a community-maintained fork for expressive, longform conversational speech synthesis.
E2M converts various file types into Markdown, offering an easy installation and flexible, open-source solution.
Transform PDFs into AI podcasts for engaging on-the-go audio content.
PyTorch implementation of a generative model for high-fidelity audio generation from text prompts.
Orpheus TTS is an open-source system for human-sounding speech synthesis using Llama-3b backbone.
TTSFM is a reverse-engineered API server mirroring OpenAI's TTS service for text-to-speech conversion.
Animation testing based on Bert-VITS2 for generating facial expressions and body animations from audio input.