Open-source framework for evaluating and testing AI and LLM systems for performance, bias, and security issues.
Phoenix is an open-source AI observability platform for experimentation, evaluation, and troubleshooting.
A unified toolkit for automatic evaluations of large language models (LLMs).
An open-source project for comparing two LLMs head-to-head with a given prompt, focusing on backend integration.
A study evaluating geopolitical and cultural biases in large language models through dual-layered assessments.
中文法律对话语言模型,旨在为法律问题提供专业可靠的回答。
Automatable GenAI Scripting for programmatically assembling prompts for LLMs using JavaScript.
Prompty simplifies the creation, management, debugging, and evaluation of LLM prompts for AI applications.
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
Large Language Model in Action is a GitHub repository demonstrating various implementations and applications of large language models.
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.