
Genspark2API is a deployment tool for AI applications with various integration and configuration options.

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Phoenix is an open-source AI observability platform for experimentation, evaluation, and troubleshooting.

A unified toolkit for automatic evaluations of large language models (LLMs).

An open-source project for comparing two LLMs head-to-head with a given prompt, focusing on backend integration.

Open-source evaluation toolkit for large multi-modality models, supporting 220+ models and 80+ benchmarks.

EuroBERT is a multilingual encoder model designed for European languages, trained using the Optimus training library.

Automatable GenAI Scripting for programmatically assembling prompts for LLMs using JavaScript.