
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Open-source framework for evaluating and testing AI and LLM systems for performance, bias, and security issues.

Phoenix is an open-source AI observability platform for experimentation, evaluation, and troubleshooting.

A unified toolkit for automatic evaluations of large language models (LLMs).

An open-source project for comparing two LLMs head-to-head with a given prompt, focusing on backend integration.

Open-source evaluation toolkit for large multi-modality models, supporting 220+ models and 80+ benchmarks.

Bare metal to production ready in mins; your own fly server on your VPS.

eBPF-based Linux high-performance transparent proxy solution.

EuroBERT is a multilingual encoder model designed for European languages, trained using the Optimus training library.