
A demo for recording audio and video streams with simultaneous speech and face recognition.

Faster Whisper transcription with CTranslate2.

A Conversational Speech Generation Model that generates audio codes from text and audio inputs.

Speech to Text but with all the bells and whistles and most importantly AI!

Spark-TTS is an advanced text-to-speech system using large language models for natural-sounding voice synthesis.

The python library for real-time communication.

A local AI-powered tool that converts PDF documents into engaging audio using local LLMs and TTS models.

A sound cloning tool with a web interface, using your voice or any sound to record audio.