
A web-based tool for processing images and converting documents with a simple interface.

Build effective agents using Model Context Protocol and simple workflow patterns.

A unified evaluation framework for large language models.

Promptfoo is a local tool for testing LLM applications with security evaluations and performance comparisons.

AutoAudit is a large language model (LLM) designed for enhancing cybersecurity through advanced AI-driven threat detection and response.

A curated list of tools, datasets, demos, and papers for evaluating large language models (LLMs).

Sample notebooks and prompts for evaluating large language models (LLMs) and generative AI.

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".