A demo for recording audio and video streams with simultaneous speech and face recognition.
Faster Whisper transcription with CTranslate2.
A Conversational Speech Generation Model that generates audio codes from text and audio inputs.
Speech to Text but with all the bells and whistles and most importantly AI!
Spark-TTS is an advanced text-to-speech system using large language models for natural-sounding voice synthesis.
The python library for real-time communication.
A local AI-powered tool that converts PDF documents into engaging audio using local LLMs and TTS models.
A sound cloning tool with a web interface, using your voice or any sound to record audio.
Robust Speech Recognition via Large-Scale Weak Supervision
A feature-rich command-line audio/video downloader with support for thousands of sites.