Newsletter
Join the Community
Subscribe to our newsletter for the latest news and updates
A unified toolkit for automatic evaluations of large language models (LLMs).

Nano Bananary is an AI batch image and video generator with 142 effects.

AI Podcast Generator for bilingual episodes, supporting multiple languages and alternative to NotebookLLM.

Zero-Config Code Flow for Claude code & Codex, enabling seamless integration and configuration for AI development.
Evalchemy is a unified and easy-to-use toolkit designed for evaluating post-trained language models (LLMs). Developed by the DataComp community and Bespoke Labs, it builds on the LM-Eval-Harness to provide a comprehensive solution for model evaluation.
Evalchemy makes running common benchmarks simple, fast, and versatile, making it an essential tool for researchers and developers working with LLMs.