Instant voice cloning by MIT and MyShell. Audio foundation model.
Foundational Models for State-of-the-Art Speech and Text Translation.
TripoSG is a high-fidelity image-to-3D generation model leveraging rectified flow transformers for superior performance.
A comprehensive collection of papers focused on evaluating large language models (LLMs).
FlagEval is an evaluation toolkit for AI large foundation models.
A collection of high-quality pretrained models and resources for Chinese natural language processing.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.