Newsletter
Join the Community
Subscribe to our newsletter for the latest news and updates
MoshiVis is a Vision Speech Model (VSM) integrating speech and image processing for interactive conversations.

Nano Bananary is an AI batch image and video generator with 142 effects.

AI Podcast Generator for bilingual episodes, supporting multiple languages and alternative to NotebookLLM.

Zero-Config Code Flow for Claude code & Codex, enabling seamless integration and configuration for AI development.
MoshiVis is a cutting-edge Vision Speech Model (VSM) designed to facilitate engaging discussions about images while maintaining a natural conversational style. Leveraging the foundational speech model Moshi, it introduces significant improvements with an additional 206M adapter parameters on top of the base model.