
EuroBERT is a multilingual encoder model designed for European languages, trained using the Optimus training library.

A prompt management, versioning, testing, and evaluation inference server and UI toolkit, provider agnostic and OpenAI API compatible.

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework.

A comprehensive library for implementing LLMs with a unified training pipeline and model evaluation.

A guidebook sharing insights and knowledge about evaluating Large Language Models (LLMs).

A comprehensive collection of papers focused on evaluating large language models (LLMs).

Automatically evaluate your LLMs in Google Colab with LLM AutoEval.

A customizable framework for efficient large model evaluation and performance benchmarking.

Efficient full parameter tuning library for reinforcement learning applications in LLMs.

Large model test toolkit front-end framework for efficient testing and collaborative evaluation of large language models.

Scenario-based large model testing toolbox for automating evaluations of large language models.