
Qwen2.5-Omni is an end-to-end multimodal model by Alibaba Cloud, capable of understanding text, audio, vision, and video.

Demo app for Groq plugins in LiveKit Agents.

A simple voice generation tool that converts text to natural speech using the CosyVoice2 model.

Transforms research papers into engaging three-person podcast discussions for a fresh listening experience.

A third-party music player providing local services, desktop lyrics, music downloads, and high sound quality.

Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

SOTA Open Source TTS for high-quality text-to-speech synthesis with multilingual support.

A one-stop solution for creating digital avatars from WeChat chat records using fine-tuned large language models.

A video translation and dubbing tool powered by LLMs for professional-grade translations and one-click deployment.

Speech-AI-Forge is a project centered on TTS generation, offering an API Server and a Gradio-based WebUI.

AbletonMCP connects Ableton Live to Claude AI, enabling AI-assisted music production and session manipulation.