Newsletter
Join the Community
Subscribe to our newsletter for the latest news and updates
A Conversational Speech Generation Model that generates audio codes from text and audio inputs.
CSM is a state-of-the-art speech generation model developed by SesameAILabs. It is designed to generate RVQ audio codes from both text and audio inputs, utilizing a robust architecture that includes a Llama backbone and a specialized audio decoder for producing Mimi audio codes.