LogoAISecKit
icon of CSM

CSM

A Conversational Speech Generation Model that generates audio codes from text and audio inputs.

Introduction

CSM (Conversational Speech Model)

CSM is a state-of-the-art speech generation model developed by SesameAILabs. It is designed to generate RVQ audio codes from both text and audio inputs, utilizing a robust architecture that includes a Llama backbone and a specialized audio decoder for producing Mimi audio codes.

Key Features:
  • Audio Generation: Generates high-quality audio from text prompts.
  • Contextual Understanding: Capable of generating audio with context for more natural conversations.
  • Open Source: Available for research and educational purposes under the Apache-2.0 license.
  • Multi-Platform Support: Compatible with various operating systems, including Windows and Linux.
Benefits:
  • Research and Development: Ideal for researchers looking to explore conversational AI and speech synthesis.
  • Interactive Demos: Includes a fine-tuned variant that powers interactive voice demos.
  • Community Contributions: Encourages contributions and collaboration through GitHub.
Highlights:
  • Latest Release: The 1B CSM variant was released on March 13, 2025, with checkpoints hosted on Hugging Face.
  • Ethical Use: Strong emphasis on responsible and ethical applications of the technology, prohibiting misuse.

Information

  • Publisher
    AISecKit
  • Websitegithub.com
  • Published date2025/04/28

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates