Newsletter
Join the Community
Subscribe to our newsletter for the latest news and updates
A unified toolkit for automatic evaluations of large language models (LLMs).
Evalchemy is a unified and easy-to-use toolkit designed for evaluating post-trained language models (LLMs). Developed by the DataComp community and Bespoke Labs, it builds on the LM-Eval-Harness to provide a comprehensive solution for model evaluation.
Evalchemy makes running common benchmarks simple, fast, and versatile, making it an essential tool for researchers and developers working with LLMs.