Tag
Explore by tags

promptbench
A unified evaluation framework for large language models.

promptfoo
Promptfoo is a local tool for testing LLM applications with security evaluations and performance comparisons.

LLM-Evaluation
Sample notebooks and prompts for evaluating large language models (LLMs) and generative AI.

Evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

genaiscript
Automatable GenAI Scripting for programmatically assembling prompts for LLMs using JavaScript.

Prompty
Prompty simplifies the creation, management, debugging, and evaluation of LLM prompts for AI applications.

Agenta
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

LLM_in_Action
Large Language Model in Action is a GitHub repository demonstrating various implementations and applications of large language models.

mcp-go
A Go implementation of the Model Context Protocol (MCP) for LLM applications.

llmkit
A prompt management, versioning, testing, and evaluation inference server and UI toolkit, provider agnostic and OpenAI API compatible.

BAML
The AI framework that adds engineering to prompt engineering, compatible with multiple programming languages.