EuroBERT is a multilingual encoder model designed for European languages, trained using the Optimus training library.
A prompt management, versioning, testing, and evaluation inference server and UI toolkit, provider agnostic and OpenAI API compatible.
Python SDK for Agent AI Observability, Monitoring and Evaluation Framework.
A comprehensive library for implementing LLMs with a unified training pipeline and model evaluation.
A guidebook sharing insights and knowledge about evaluating Large Language Models (LLMs).
A comprehensive collection of papers focused on evaluating large language models (LLMs).
Automatically evaluate your LLMs in Google Colab with LLM AutoEval.
A customizable framework for efficient large model evaluation and performance benchmarking.
Efficient full parameter tuning library for reinforcement learning applications in LLMs.
Large model test toolkit front-end framework for efficient testing and collaborative evaluation of large language models.
Scenario-based large model testing toolbox for automating evaluations of large language models.